Simplifying upserts and deletes on Delta Lake tables

Cite

Related Material

Plain Schwarz

Babu, Prashanth

Formal Metadata

Title

Simplifying upserts and deletes on Delta Lake tables

Title of Series

Berlin Buzzwords 2021

Number of Parts

Author

Babu, Prashanth

Contributors

N. N. (Moderation)

License

CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/67348 (DOI)

Publisher

Plain Schwarz

Release Date

2021

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Data Engineers face many challenges with Data Lakes. GDPR requests, data quality issues, handling large metadata, merges and deletes are a few of the tough challenges usually every Data Engineer encounters with a Data Lake with formats like Parquet, ORC, Avro, etc. This session showcases how you can effortlessly apply updates, upserts and deletes on a Delta Lake table with a very few lines of code and use time travel to go back in time for reproducing experiments & reports very easily, how we can avoid challenges due to small files as well. Delta Lake was developed by Databricks and has been donated to Linux Foundation, the code for which could be found at http://delta.io. Delta Lake is being used by a huge number of companies across the world due to its advantages for Data Lakes. We will discuss, demo and showcase how Delta Lake can be helpful for your Data Lakes because of which many enterprises have Delta Lake as the default data format in their architecture. We will will use SQL or its equivalent Python or Scala API to perform showcase various Delta Lake features.