From Zero to Portability

Cite

FOSDEM VZW

Michels, Maximilian

Formal Metadata

Title

From Zero to Portability

Subtitle

Apache Beam's Journey to Cross-Language Data Processing

Title of Series

FOSDEM 2019

Number of Parts

561

Author

Michels, Maximilian

License

CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/44282 (DOI)

Publisher

FOSDEM VZW

Release Date

2019

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Apache Beam is a programming model for composing parallel and distributed data processing jobs. As many other Apache projects, Beam first used Java as its API language. Unsatisfied with the status quo, Beam developers launched the portability project to enable other languages to run with Beam. Currently, Beam has a Java, Python, and a Go API. Ultimately, these languages won't just coexist in Apache Beam, but they will complement each other in cross-language data processing jobs. In this talk we will learn how it is possible to support multiple languages and why it might be a good idea to combine these languages in data processing jobs. Apache Beam is a programming model for composing parallel and distributed data processing jobs. Once composed, these jobs run on various execution engines like Apache Flink, Apache Spark, or Google Cloud Dataflow. But Apache Beam's vision goes beyond just running on multiple execution engines. As many other Apache projects, Beam first used Java as its API language. Unsatisfied with the status quo, Beam developers launched the portability project to enable other languages to run with Beam. Currently, Beam has a Java, Python, and a Go API. That means users are not restricted to the Java ecosystem but can use their favorite Python libraries like Numpy or Tensorflow with Apache Beam. Ultimately, these languages won't just coexist in Apache Beam, but they will complement each other in cross-language data processing jobs. For example, reading from Kafka can be done with the Java connector but the data can afterwards be processed in Python. In this talk we will learn how it is possible to support multiple languages and why it might be a good idea to combine these languages in data processing jobs.