Extending Spark Machine Learning Pipelines

Zitieren

FOSDEM VZW

Karau, Holden

Formale Metadaten

Titel

Extending Spark Machine Learning Pipelines

Untertitel

Going beyond wordcount with Spark ML

Serientitel

FOSDEM 2017

Anzahl der Teile

611

Autor

Karau, Holden

Lizenz

CC-Namensnennung 2.0 Belgien:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.

Identifikatoren

10.5446/42011 (DOI)

Herausgeber

FOSDEM VZW

Erscheinungsjahr

2018

Sprache

Englisch

Produktionsjahr

2017

Inhaltliche Metadaten

Fachgebiet

Informatik

Genre

Konferenz/Talk

Abstract

Apache Spark is one of the most popular new "big data" technologies, and nowhas a sci-kit-learn inspired pipeline API. This talk looks at how the pipelineAPI works as well as how to add your own custom algorithms to Apache Spark. Apache Spark is one of the most popular new "big data" technologies, and nowhas a sci-kit-learn inspired pipeline API. This talk looks at how the pipelineAPI works as well as how to add your own custom algorithms to Apache Spark.The talk will be focused in Scala, but the same techniques can be used in Javaor with other JVM languages. Sadly extending the pipeline API can notcurrently be done in non-JVM languages, but the information on how to use thepipeline API will be useful to Python and R users as well.