We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Writing and Scaling Collaborative Data Pipelines with Kedro

Formal Metadata

Title
Writing and Scaling Collaborative Data Pipelines with Kedro
Subtitle
How to get your Data Scientists and Data Engineers to play nice, both now and in the future.
Title of Series
Number of Parts
130
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
The goal of this talk is to introduce data pipeline developers to QuantumBlack's approach for keeping data pipelines healthy and sustainable and facilitating collaboration between data scientists and data engineers by using our open source framework, Kedro. Attendees need between novice and intermediate knowledge of Python (enough to understand syntactic sugar and funargs) in order to appreciate this talk. As data continues to inform more and more business strategy, high quality, fully featured data pipelines have never been more critical. Small data scripts and single-coder science projects are not enough to keep up with the pace of day-to-day business and their ever-growing list of requirements. Now, more than ever, we need data engineers and data scientists to collaborate effectively. Yet, these two parties come with inherently competing needs. Data scientists need high data volatility and parameterization, for experimentation, and data engineers, on the other hand, need stability and performance, to deliver data. Furthermore, as pipelines grow, the cost of knowledge transfer and training new team members also increases. How can we get scientists and engineers to work well together, and sustain pipeline growth as the team also grows? For this, QuantumBlack created Kedro, a framework for writing data pipelines that addresses both the needs for flexibility and stability in its features and patterns of use. By using Kedro’s tools and operating model, we have enabled our teams to scale our single-developer, micro-pipes to industrial sized data processors with dozens of developers; all without sacrificing readability, quality, or stability. This talk will show you how.