We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Nix for data pipeline configuration

Formal Metadata

Title
Nix for data pipeline configuration
Title of Series
Number of Parts
27
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
My team develops a data pipeline to generate music recommendations. It consists of many batch jobs that read data from somewhere and write their output somewhere else, with complex dependencies and parameter tuning. Historically, we have configured these batch jobs with hand-written bash configuration, or with dedicated python-based tools such as Airflow. However, both lack flexibility, often forcing the developer to bypass them and to run jobs manually during development. The tasks of data pipeline configuration and package definition share some requirements: both imply running many programs in a specific order and with specific parameters. Since nix is a language dedicated to packages definition, which allows expressing packages in a succinct and highly flexible way, we decided to try to use it for data pipeline configuration. Nix-the-tool is too centered around package management for our use case, so we built our own tool around nix-the-language. It this talk, we’ll explore how to apply nix to data pipeline configuration. This will give us the opportunity to look at nix as a language, abstracted from its current ecosystem. We’ll also explore how to structure a nix codebase, encountering the same questions nixpkgs encountered a long time ago, but in a much smaller environment. The main goal of this talk is to share the different point of view of nix that comes from applying it to a different problem and starting from scratch. We also hope to serve as an inspiration to explore other nix-based DSLs. --- Bio: Georges is a Software Engineer at SoundCloud, in Berlin. He is part of the team that generates music recommendations. He loves exploring new ways to solve engineering problems, which led him to look into exciting technologies such as Haskell and NixOS. Some of his favorites hobbies are playing board games and learning German.