Running Apache Spark on K8s: From AWS EMR to K8s

CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/67164 (DOI)

Publisher

Plain Schwarz

Release Date

2022

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Spark is a trend technology that it is being used for a lot of companies for large-scale data analytics. During the first try, companies usually try to use the cloud provider solution to speed up their time to market, but once Spark is broadly embrace by more teams in the company and the solution should be able to be multi cloud provider, then the Kubernetes adoption appear and the journey to make it happen its worth to share to inspire others in the same situation. In this talk the audience will learn some benefits to migrate from AWS EMR to Spark on Kubernetes, from operability point of view (reliability, portability, scalability), through observability and finally reviewing efficiency and costs. This talk is a real use case three teams at Empathy.co were working during 6 months to make their solution more agnostic and with minimum cloud dependencies.