We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Data governance in streaming at scale

Formale Metadaten

Titel
Data governance in streaming at scale
Serientitel
Anzahl der Teile
69
Autor
Mitwirkende
Lizenz
CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
Letgo is a second-hand marketplace app reshaping secondhand trade in Turkey so re-use is the default trusted choice. We designed the data platform to be built on top of the principles of self-servicing, privacy laws compliance, data governance at business unit level, minimal maintenance and cost containment by design. We will describe how we defined our company-wise data model, leveraging Avro schemas and enabling at the same time most impactful features like: - tagging private fields for sensitive data for data privacy laws compliance - ensuring quality and structure of the data landing in the company data lake - efficient and reliable transportation and consumption of data at platform level - data catalog: discovery of available data by teams Our design is built around the Apache Kafka ecosystem—with special mention to Kafka Connect—for data ingestion and AWS services plus Spark framework for data transformations and data lake ingestion. Thanks to these principles we are able to ensure data governance over batch and real-time data, while keeping at the same time a multi-tiered data lake: the inner tier keeps the most sensitive data and the outer tiers keep only the data accessible for each single company business unit.