Apache Spark is one of the most popular new "big data" technologies, and nowhas a sci-kit-learn inspired pipeline API. This talk looks at how the pipelineAPI works as well as how to add your own custom algorithms to Apache Spark.
Apache Spark is one of the most popular new "big data" technologies, and nowhas a sci-kit-learn inspired pipeline API. This talk looks at how the pipelineAPI works as well as how to add your own custom algorithms to Apache Spark.The talk will be focused in Scala, but the same techniques can be used in Javaor with other JVM languages. Sadly extending the pipeline API can notcurrently be done in non-JVM languages, but the information on how to use thepipeline API will be useful to Python and R users as well. |