Let's Do Data Lineage in Kafka, Flink and Druid!

Zitieren

Zugehöriges Material

Plain Schwarz

Becker, Hellmar

Formale Metadaten

Titel

Let's Do Data Lineage in Kafka, Flink and Druid!

Serientitel

Berlin Buzzwords 2024

Anzahl der Teile

Autor

Becker, Hellmar

Mitwirkende

N. N. (Moderation)

Lizenz

CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.

Identifikatoren

10.5446/70226 (DOI)

Herausgeber

Plain Schwarz

Erscheinungsjahr

2024

Sprache

Englisch

Inhaltliche Metadaten

Fachgebiet

Informatik

Genre

Konferenz/Talk

Abstract

Data lineage means you can track the data bits in your system and know at any time where they come from and how exactly they have been processed. Enterprise systems need to be able to prove lineage for compliance reasons, but in general, lineage is also a big part of data discoverability and governance. In this talk, I am going to connect a few Raspberry Pi's that collect ADS-B (aircraft radar) data to a KFD (Kafka-Flink-Druid) stack for analytical processing. I will deliver the data through Kafka, cleanse and enrich them with Flink, and run analytical queries on the result with Druid. I am going to track data lineage through Kafka metadata, and I am going to show how that information can be maintained throughout the processing pipeline. This relies on using Kafka headers, an underused feature of Kafka that also integrates readily and easily with Druid! You will learn how data lineage can be implemented using the open source KFD stack and readily available data sources, so you, too, can try out enterprise style data lineage processing, and prepare yourself at home for a question that will arise in any enterprise data engineering project!