We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Extending Spark Machine Learning Pipelines

Formal Metadata

Title
Extending Spark Machine Learning Pipelines
Subtitle
Going beyond wordcount with Spark ML
Title of Series
Number of Parts
611
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production Year2017

Content Metadata

Subject Area
Genre
Abstract
Apache Spark is one of the most popular new "big data" technologies, and nowhas a sci-kit-learn inspired pipeline API. This talk looks at how the pipelineAPI works as well as how to add your own custom algorithms to Apache Spark. Apache Spark is one of the most popular new "big data" technologies, and nowhas a sci-kit-learn inspired pipeline API. This talk looks at how the pipelineAPI works as well as how to add your own custom algorithms to Apache Spark.The talk will be focused in Scala, but the same techniques can be used in Javaor with other JVM languages. Sadly extending the pipeline API can notcurrently be done in non-JVM languages, but the information on how to use thepipeline API will be useful to Python and R users as well.