Writing and Scaling Collaborative Data Pipelines with Kedro

Zitieren

Zugehöriges Material

EuroPython

Nguyen, Tam-Sanh

Formale Metadaten

Titel

Writing and Scaling Collaborative Data Pipelines with Kedro

Untertitel

How to get your Data Scientists and Data Engineers to play nice, both now and in the future.

Serientitel

EuroPython 2020

Anzahl der Teile

130

Autor

Nguyen, Tam-Sanh

Lizenz

CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben

Identifikatoren

10.5446/49948 (DOI)

Herausgeber

EuroPython

Erscheinungsjahr

2020

Sprache

Englisch

Inhaltliche Metadaten

Fachgebiet

Informatik

Genre

Konferenz/Talk

Abstract

The goal of this talk is to introduce data pipeline developers to QuantumBlack's approach for keeping data pipelines healthy and sustainable and facilitating collaboration between data scientists and data engineers by using our open source framework, Kedro. Attendees need between novice and intermediate knowledge of Python (enough to understand syntactic sugar and funargs) in order to appreciate this talk. As data continues to inform more and more business strategy, high quality, fully featured data pipelines have never been more critical. Small data scripts and single-coder science projects are not enough to keep up with the pace of day-to-day business and their ever-growing list of requirements. Now, more than ever, we need data engineers and data scientists to collaborate effectively. Yet, these two parties come with inherently competing needs. Data scientists need high data volatility and parameterization, for experimentation, and data engineers, on the other hand, need stability and performance, to deliver data. Furthermore, as pipelines grow, the cost of knowledge transfer and training new team members also increases. How can we get scientists and engineers to work well together, and sustain pipeline growth as the team also grows? For this, QuantumBlack created Kedro, a framework for writing data pipelines that addresses both the needs for flexibility and stability in its features and patterns of use. By using Kedro’s tools and operating model, we have enabled our teams to scale our single-developer, micro-pipes to industrial sized data processors with dozens of developers; all without sacrificing readability, quality, or stability. This talk will show you how.