Extending Spark Machine Learning Pipelines

Cite

FOSDEM VZW

Karau, Holden

Formal Metadata

Title

Extending Spark Machine Learning Pipelines

Subtitle

Going beyond wordcount with Spark ML

Title of Series

FOSDEM 2017

Number of Parts

611

Author

Karau, Holden

License

CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/42011 (DOI)

Publisher

FOSDEM VZW

Release Date

2018

Language

English

Production Year

2017

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Apache Spark is one of the most popular new "big data" technologies, and nowhas a sci-kit-learn inspired pipeline API. This talk looks at how the pipelineAPI works as well as how to add your own custom algorithms to Apache Spark. Apache Spark is one of the most popular new "big data" technologies, and nowhas a sci-kit-learn inspired pipeline API. This talk looks at how the pipelineAPI works as well as how to add your own custom algorithms to Apache Spark.The talk will be focused in Scala, but the same techniques can be used in Javaor with other JVM languages. Sadly extending the pipeline API can notcurrently be done in non-JVM languages, but the information on how to use thepipeline API will be useful to Python and R users as well.