We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Tip of the Iceberg

Formale Metadaten

Titel
Tip of the Iceberg
Serientitel
Anzahl der Teile
60
Autor
Lizenz
CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
Apache Iceberg is a high-performance format for huge analytic tables. Iceberg brings the reliability and simplicity of SQL tables to big data while making it possible for engines to work with the same tables, at the same time. Iceberg is a layer on top of your traditional Parquet tables with all the best practices from the database world. Using this you can do ACID operations on a table that solely lives in cloud storage. In the talk, I'll first introduce Iceberg and its history, and the companies that are using and actively contributing to it. We'll take a peek under the hood and I'll explain the different concepts such as metadata, manifest lists, and manifest itself, and how it uses this to help the query engine, and maintain correctness. Next, I'll go through the schema, partition, and sorting evolution and how this is done in a lazy fashion so you don't have to rewrite your multi-petabyte table, and finally I'll do a quick demo using PyIceberg.