We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Data governance in streaming at scale

Formal Metadata

Title
Data governance in streaming at scale
Title of Series
Number of Parts
69
Author
Contributors
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Letgo is a second-hand marketplace app reshaping secondhand trade in Turkey so re-use is the default trusted choice. We designed the data platform to be built on top of the principles of self-servicing, privacy laws compliance, data governance at business unit level, minimal maintenance and cost containment by design. We will describe how we defined our company-wise data model, leveraging Avro schemas and enabling at the same time most impactful features like: - tagging private fields for sensitive data for data privacy laws compliance - ensuring quality and structure of the data landing in the company data lake - efficient and reliable transportation and consumption of data at platform level - data catalog: discovery of available data by teams Our design is built around the Apache Kafka ecosystem—with special mention to Kafka Connect—for data ingestion and AWS services plus Spark framework for data transformations and data lake ingestion. Thanks to these principles we are able to ensure data governance over batch and real-time data, while keeping at the same time a multi-tiered data lake: the inner tier keeps the most sensitive data and the outer tiers keep only the data accessible for each single company business unit.