We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Large scale data analysis made easy - Apache Hadoop

Formal Metadata

Title
Large scale data analysis made easy - Apache Hadoop
Title of Series
Number of Parts
97
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
The goal of Apache Hadoop is to make large scale data analysis easy. Hadoop implements a distributed filesystem based on the dieas behind GFS, the Google File System. With Map/Reduce it provides an easy way to implement parallel algorithms. Storage has become ever cheaper in recent years. Currently one terabyte of harddisk space costs less than 100 Euros. As a result a growing number of businesses have started collecting and digitizing data: Custumer transaction logs, news articles published over decades, crawls of parts o f the world wide web are only few use cases that produce large amounts of data. But with petabytes of data at your fingertips the question of how to make ad-hoc as well as continuous processing efficient arises. The goal of Apache Hadoop is to make large scale data analysis easy. Hadoop implements a distributed filesystem based on the dieas behind GFS, the Google File System. With Map/Reduce it provides an easy way to implement parallel algorithms. After motivating the neeed for a distributed library the talk gives an introduction to Hadoop detailing its strengths and weaknesses. It gives an introduction on how to quickly get your own Map/Reduce jobs up and running. The talk closes with an overview of the Hadoop ecosystem.