Designing Functional Data Pipelines for Reproducibility and Maintainability

Cite

EuroPython

Ong, Chin Hwee

Formal Metadata

Title

Designing Functional Data Pipelines for Reproducibility and Maintainability

Title of Series

EuroPython 2021

Number of Parts

115

Author

Ong, Chin Hwee

Contributors

Petkos, Theofanis (Moderation)

License

CC Attribution - NonCommercial - ShareAlike 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this

Identifiers

10.5446/58753 (DOI)

Publisher

EuroPython

Release Date

2021

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

When building data pipelines at scale, it is crucial to design data pipelines that are reliable, scalable and extensible according to evolving business needs. Designing data pipelines for reproducibility and maintainability is a challenge, as testing and debugging across compute units (threads/cores/computes) are often complex and time-consuming due to dependencies and shared states at runtime. In this talk, Chin Hwee will be sharing about common challenges in designing reproducible and maintainable data pipelines at scale, and exploring the use of functional programming in Python to build scalable production-ready data pipelines that are designed for reproducibility and maintainability. Through analogies and realistic examples inspired by data pipeline designs in production environments, you will learn about: What is Functional Programming, and how it differs from other programming paradigms Key Principles of Functional Programming How "control flow" is implemented in Functional Programming Functional design patterns for data pipeline design in Python, and how they improve reproducibility and maintainability Whether it is possible to write a purely functional program in Python This talk assumes basic understanding of building data pipelines with functions and classes/objects. While the main target audience are data scientists/engineers and developers building data-intensive applications, anyone with hands-on experience in imperative programming (including Python) would be able to understand the key concepts and use-cases in functional programming.