Nix for data pipeline configuration

Cite

NixCon

Dubus, Georges

Formal Metadata

Title

Nix for data pipeline configuration

Title of Series

NixCon2018

Number of Parts

Author

Dubus, Georges

License

CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/39611 (DOI)

Publisher

NixCon

Release Date

2018

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

My team develops a data pipeline to generate music recommendations. It consists of many batch jobs that read data from somewhere and write their output somewhere else, with complex dependencies and parameter tuning. Historically, we have configured these batch jobs with hand-written bash configuration, or with dedicated python-based tools such as Airflow. However, both lack flexibility, often forcing the developer to bypass them and to run jobs manually during development. The tasks of data pipeline configuration and package definition share some requirements: both imply running many programs in a specific order and with specific parameters. Since nix is a language dedicated to packages definition, which allows expressing packages in a succinct and highly flexible way, we decided to try to use it for data pipeline configuration. Nix-the-tool is too centered around package management for our use case, so we built our own tool around nix-the-language. It this talk, we’ll explore how to apply nix to data pipeline configuration. This will give us the opportunity to look at nix as a language, abstracted from its current ecosystem. We’ll also explore how to structure a nix codebase, encountering the same questions nixpkgs encountered a long time ago, but in a much smaller environment. The main goal of this talk is to share the different point of view of nix that comes from applying it to a different problem and starting from scratch. We also hope to serve as an inspiration to explore other nix-based DSLs. --- Bio: Georges is a Software Engineer at SoundCloud, in Berlin. He is part of the team that generates music recommendations. He loves exploring new ways to solve engineering problems, which led him to look into exciting technologies such as Haskell and NixOS. Some of his favorites hobbies are playing board games and learning German.