We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Automating Spark (and Pipeline) Upgrades While "Testing" in Production

Formal Metadata

Title
Automating Spark (and Pipeline) Upgrades While "Testing" in Production
Title of Series
Number of Parts
798
Author
Contributors
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
With Spark 4 in the pipeline for this year, many of us are looking at what will be involved in upgrading to the latest and greatest Spark. This talk will look at the open-source tooling we use to automate upgrading thousands of our Spark pipelines (from Spark 2.X -> 3.4) and how to we used a variation of the write-audit-publish technique to validate the new pipelines in production for pipelines that might have less testing than ideal. Seeing is a pre-requisite to believing, so the talk will include a short demo showing how the spark-upgrade tool works on a demo pipeline complete with "live" validation. In this talk, you will learn how to: Upgrade your Spark pipelines without crying* Validating Spark (and other similar) pipelines even when you don't trust the tests (by extending the write-audit-publish pattern) on top of Iceberg, Hudi, or Delta Lake. Time permitting, I will end with some exciting new (but non-backward compatible) changes coming in Spark 4 (tentatively scheduled for June, but it's software). *Not a guarantee, some upgrades may still cause tears