We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Automating Spark (and Pipeline) Upgrades While "Testing" in Production

Formale Metadaten

Titel
Automating Spark (and Pipeline) Upgrades While "Testing" in Production
Serientitel
Anzahl der Teile
798
Autor
Mitwirkende
Lizenz
CC-Namensnennung 2.0 Belgien:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
With Spark 4 in the pipeline for this year, many of us are looking at what will be involved in upgrading to the latest and greatest Spark. This talk will look at the open-source tooling we use to automate upgrading thousands of our Spark pipelines (from Spark 2.X -> 3.4) and how to we used a variation of the write-audit-publish technique to validate the new pipelines in production for pipelines that might have less testing than ideal. Seeing is a pre-requisite to believing, so the talk will include a short demo showing how the spark-upgrade tool works on a demo pipeline complete with "live" validation. In this talk, you will learn how to: Upgrade your Spark pipelines without crying* Validating Spark (and other similar) pipelines even when you don't trust the tests (by extending the write-audit-publish pattern) on top of Iceberg, Hudi, or Delta Lake. Time permitting, I will end with some exciting new (but non-backward compatible) changes coming in Spark 4 (tentatively scheduled for June, but it's software). *Not a guarantee, some upgrades may still cause tears