We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Data Analytics with MySQL, Apache Spark and Apache Drill

Formal Metadata

Title
Data Analytics with MySQL, Apache Spark and Apache Drill
Title of Series
Number of Parts
611
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production Year2017

Content Metadata

Subject Area
Genre
Abstract
Apache Spark is a cluster computing framework, similar to Apache Hadoop. Thereare a number of tasks where MySQL does not show great performance: for exampleMySQL is not massively parallel system and a single query will only utilize 1CPU core . Spark, on the the other hand is designed to be massively parallel;in addition Spark is a clustering framework, so you can easily add morecompute nodes so that Spark can utilize more resources and scale. Apache Drill is similar project aimed to make data discovery easier. Forexample it allow you to join data sources in MySQL, MongoDB, flat files, otherRDBMS, etc. In this talk I will demonstrate how to use Apache Spark together with MySQLfor data analysis. I will sho how Apache Spark aggregates data (wikipediapageview statistics) and stores the resultset in MySQL. I will also show howto use Apache Spark with multiple sources and join virtual tables from MySQL,flat files and even MongoDB.