We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Big Data Analytics with Python using Stratosphere

Formal Metadata

Title
Big Data Analytics with Python using Stratosphere
Title of Series
Part Number
94
Number of Parts
119
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production PlaceBerlin

Content Metadata

Subject Area
Genre
Abstract
Chesnay Schepler - Big Data Analytics with Python using Stratosphere Stratosphere is a distributed platform for advanced big data analytics. It features a rich set of operators, advanced, iterative data flows, an efficient runtime, and automatic program optimization. We present Stratophere's new Python programming interface. It allows Python developers to easily get their hands on Big Data. ----- [Stratosphere] is implemented in Java. In 2013 we introduced support for writing Stratosphere programs in Scala. Since Scala also runs in the Java JVM the language integration was easy for Scala. In late 2013, we started to develop a generic language binding framework for Stratosphere to support non-JVM languages such as Python, JavaScript, Ruby but also compiled languages such as C++. The language binding framework uses [Google’s Protocol Buffers] for efficient data serialization and transportation between the languages. Since many “Data Scientists” and machine learning experts are using Python on a daily basis, we decided to use Python as the reference implementation for Stratosphere’s language binding feature. Our talk at the EuroPython 2014 will present how Python developers can leverage the Stratosphere Platform to solve their big data problems. We introduce the most important concepts of Stratosphere such as the operators, connectors to data sources, data flows, the compiler, iterative algorithms and more. Stratosphere is a mature, next generation big-data analytics platform developed by a vibrant [open-source community]. The system is available under the Apache 2.0 license. The project started in 2009 as a joint research project of multiple universities in the Berlin area (Technische Universität, Humboldt Universität and Hasso-Plattner Institut). Nowadays it is an award winning system that has gained worldwide attention in both research and industry.
Keywords