We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Observability for data pipelines with OpenLineage

Formal Metadata

Title
Observability for data pipelines with OpenLineage
Title of Series
Number of Parts
69
Author
Contributors
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Data is increasingly becoming core to many products. Whether to provide recommendations for users, getting insights on how they use the product or using machine learning to improve the experience. This creates a critical need for reliable data operations and understanding how data is flowing through our systems. Data pipelines must be auditable, reliable and run on time. This proves particularly difficult in a constantly changing, fast paced environment. Collecting this lineage metadata as data pipelines are running provides understanding of dependencies between many teams consuming and producing data and how constant changes impact them. It is the underlying foundation that enables the many use cases related to data operations. The OpenLineage project is an API standardizing this metadata across the ecosystem, reducing complexity and duplicate work in collecting lineage information. It enables many projects, consumers of lineage in the ecosystem whether they focus on operations, governance or security. Marquez is an open source projects part of the LF AI & Data foundation which instruments data pipelines to collect lineage and metadata and enable those use cases. It implements the OpenLineage API and provides context by making visible dependencies across organisations and technologies as they change over time.