Tip of the Iceberg

Cite

Related Material

Plain Schwarz

Driesprong, Fokko

Formal Metadata

Title

Tip of the Iceberg

Title of Series

Berlin Buzzwords 2023

Number of Parts

Author

Driesprong, Fokko

License

CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/66642 (DOI)

Publisher

Plain Schwarz

Release Date

2023

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Apache Iceberg is a high-performance format for huge analytic tables. Iceberg brings the reliability and simplicity of SQL tables to big data while making it possible for engines to work with the same tables, at the same time. Iceberg is a layer on top of your traditional Parquet tables with all the best practices from the database world. Using this you can do ACID operations on a table that solely lives in cloud storage. In the talk, I'll first introduce Iceberg and its history, and the companies that are using and actively contributing to it. We'll take a peek under the hood and I'll explain the different concepts such as metadata, manifest lists, and manifest itself, and how it uses this to help the query engine, and maintain correctness. Next, I'll go through the schema, partition, and sorting evolution and how this is done in a lazy fashion so you don't have to rewrite your multi-petabyte table, and finally I'll do a quick demo using PyIceberg.