We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Big Data Analytics with Python using Stratosphere

Formale Metadaten

Titel
Big Data Analytics with Python using Stratosphere
Serientitel
Teil
94
Anzahl der Teile
119
Autor
Lizenz
CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache
ProduktionsortBerlin

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
Chesnay Schepler - Big Data Analytics with Python using Stratosphere Stratosphere is a distributed platform for advanced big data analytics. It features a rich set of operators, advanced, iterative data flows, an efficient runtime, and automatic program optimization. We present Stratophere's new Python programming interface. It allows Python developers to easily get their hands on Big Data. ----- [Stratosphere] is implemented in Java. In 2013 we introduced support for writing Stratosphere programs in Scala. Since Scala also runs in the Java JVM the language integration was easy for Scala. In late 2013, we started to develop a generic language binding framework for Stratosphere to support non-JVM languages such as Python, JavaScript, Ruby but also compiled languages such as C++. The language binding framework uses [Google’s Protocol Buffers] for efficient data serialization and transportation between the languages. Since many “Data Scientists” and machine learning experts are using Python on a daily basis, we decided to use Python as the reference implementation for Stratosphere’s language binding feature. Our talk at the EuroPython 2014 will present how Python developers can leverage the Stratosphere Platform to solve their big data problems. We introduce the most important concepts of Stratosphere such as the operators, connectors to data sources, data flows, the compiler, iterative algorithms and more. Stratosphere is a mature, next generation big-data analytics platform developed by a vibrant [open-source community]. The system is available under the Apache 2.0 license. The project started in 2009 as a joint research project of multiple universities in the Berlin area (Technische Universität, Humboldt Universität and Hasso-Plattner Institut). Nowadays it is an award winning system that has gained worldwide attention in both research and industry.
Schlagwörter