Building Data Workflows with Luigi and Kubernetes

EuroPython

Chhantyal, Nar Kumar

Formal Metadata

Title

Subtitle

Manage complex data pipelines and seamlessly scale them on-demand

Title of Series

EuroPython 2019

Number of Parts

118

Author

Chhantyal, Nar Kumar

License

CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this

Identifiers

10.5446/44818 (DOI)

Publisher

EuroPython

Release Date

2019

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

This talk will focus on how one can build complex data pipelines in Python. I will introduce Luigi and show how it solves problems while running multiple chain of batch jobs like dependency resolution, workflow management, visualisation, failure handling etc. After that, I will present how to package Luigi pipelines as Docker image for easier testing and deployment. Finally, I will go through way to deploy them on Kubernetes cluster, thus making it possible to scale Big Data pipelines on-demand and reduce infrastructure costs. I will also give tips and tricks to make Luigi Scheduler play well with Kubernetes batch execution feature. This talk will be accompanied by demo project. It will be very beneficial for audience who have some experience in running batch jobs (not necessarily in Python), typically people who work in Big Data sphere like data scientists, data engineers, BI devs and software developers. Familiarity with Python is helpful but not needed.

Keywords