Writing and Scaling Collaborative Data Pipelines with Kedro

Cite

Related Material

EuroPython

Nguyen, Tam-Sanh

Formal Metadata

Title

Writing and Scaling Collaborative Data Pipelines with Kedro

Subtitle

How to get your Data Scientists and Data Engineers to play nice, both now and in the future.

Title of Series

EuroPython 2020

Number of Parts

130

Author

Nguyen, Tam-Sanh

License

CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this

Identifiers

10.5446/49948 (DOI)

Publisher

EuroPython

Release Date

2020

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

The goal of this talk is to introduce data pipeline developers to QuantumBlack's approach for keeping data pipelines healthy and sustainable and facilitating collaboration between data scientists and data engineers by using our open source framework, Kedro. Attendees need between novice and intermediate knowledge of Python (enough to understand syntactic sugar and funargs) in order to appreciate this talk. As data continues to inform more and more business strategy, high quality, fully featured data pipelines have never been more critical. Small data scripts and single-coder science projects are not enough to keep up with the pace of day-to-day business and their ever-growing list of requirements. Now, more than ever, we need data engineers and data scientists to collaborate effectively. Yet, these two parties come with inherently competing needs. Data scientists need high data volatility and parameterization, for experimentation, and data engineers, on the other hand, need stability and performance, to deliver data. Furthermore, as pipelines grow, the cost of knowledge transfer and training new team members also increases. How can we get scientists and engineers to work well together, and sustain pipeline growth as the team also grows? For this, QuantumBlack created Kedro, a framework for writing data pipelines that addresses both the needs for flexibility and stability in its features and patterns of use. By using Kedro’s tools and operating model, we have enabled our teams to scale our single-developer, micro-pipes to industrial sized data processors with dozens of developers; all without sacrificing readability, quality, or stability. This talk will show you how.