Data Analytics with MySQL, Apache Spark and Apache Drill

FOSDEM VZW

Smirnova, Sveta Rubin, Alexander

Formale Metadaten

Titel

Serientitel

FOSDEM 2017

Anzahl der Teile

611

Autor

Smirnova, Sveta

Rubin, Alexander

Lizenz

CC-Namensnennung 2.0 Belgien:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.

Identifikatoren

10.5446/41959 (DOI)

Herausgeber

FOSDEM VZW

Erscheinungsjahr

2018

Sprache

Englisch

Produktionsjahr

2017

Inhaltliche Metadaten

Fachgebiet

Informatik

Genre

Konferenz/Talk

Abstract

Apache Spark is a cluster computing framework, similar to Apache Hadoop. Thereare a number of tasks where MySQL does not show great performance: for exampleMySQL is not massively parallel system and a single query will only utilize 1CPU core . Spark, on the the other hand is designed to be massively parallel;in addition Spark is a clustering framework, so you can easily add morecompute nodes so that Spark can utilize more resources and scale. Apache Drill is similar project aimed to make data discovery easier. Forexample it allow you to join data sources in MySQL, MongoDB, flat files, otherRDBMS, etc. In this talk I will demonstrate how to use Apache Spark together with MySQLfor data analysis. I will sho how Apache Spark aggregates data (wikipediapageview statistics) and stores the resultset in MySQL. I will also show howto use Apache Spark with multiple sources and join virtual tables from MySQL,flat files and even MongoDB.