Nix roadmap

Video thumbnail (Frame 0) Video thumbnail (Frame 7292) Video thumbnail (Frame 12006) Video thumbnail (Frame 21356) Video thumbnail (Frame 30506) Video thumbnail (Frame 36402) Video thumbnail (Frame 45083)
Video in TIB AV-Portal: Nix roadmap

Formal Metadata

Title
Nix roadmap
Title of Series
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
2018
Language
English

Content Metadata

Subject Area
Abstract
Our BDFL will be talking about the Nix roadmap --- Bio: Eelco is a senior software engineer at Tweag I/O. He obtained a PhD in Computer Science from Utrecht University in 2006 and was a postdoctoral researcher at Utrecht University and Delft University. As part of his PhD research project at Utrecht University, he developed Nix, the purely functional package manager, which forms the basis of the NixOS Linux distribution. He previously worked at LogicBlox and Infor.
Point (geometry) Mapping Chemical equation Multiplication sign Direction (geometry) Software developer Projective plane Planning Leak Product (business) Compiler Process (computing) Mixed reality Intrusion detection system Core dump output Self-organization Right angle Statement (computer science) Cycle (graph theory) Quicksort Metropolitan area network
Functional programming Functional (mathematics) Perfect group Computer file Multiplication sign System administrator Source code Password Parameter (computer programming) Formal language Usability Performance appraisal Computer configuration Algebraic closure Energy level Statement (computer science) Key (cryptography) Moment (mathematics) Data storage device Performance appraisal Computer configuration Algebraic closure Password Statement (computer science) Configuration space Quicksort Abstraction
Randomization Building Run time (program lifecycle phase) INTEGRAL Graph (mathematics) Multiplication sign Source code Function (mathematics) Independence (probability theory) Derivation (linguistics) Mechanism design Mathematics Pattern matching Radio-frequency identification Algebraic closure Hash function Core dump Pattern language Extension (kinesiology) Error message Physical system Software developer Binary code Data storage device Internet service provider Attribute grammar Instance (computer science) Electronic signature Derivation (linguistics) Category of being Proof theory Computer configuration Hash function output Configuration space Right angle Quicksort Resultant Spacetime Functional (mathematics) Vapor barrier Perfect group Computer file Content (media) Rule of inference Attribute grammar Revision control Gaussian elimination Performance appraisal Energy level Lie group Address space Installation art Graph (mathematics) Run time (program lifecycle phase) Expression Projective plane Content (media) Binary file Cryptography Timestamp Compiler Performance appraisal Mathematics Cache (computing) Uniform resource locator Algebraic closure Personal digital assistant Function (mathematics) Statement (computer science) Address space
Functional (mathematics) Building Module (mathematics) Run time (program lifecycle phase) Set (mathematics) Function (mathematics) Parameter (computer programming) Modulare Programmierung Mereology Computer programming Attribute grammar Formal language Revision control Derivation (linguistics) Goodness of fit Performance appraisal Type theory Read-only memory Computer configuration Semiconductor memory Speicherbereinigung Configuration space Plug-in (computing) Descriptive statistics Physical system Plug-in (computing) Meta element Graph (mathematics) Graph (mathematics) Attribute grammar Performance appraisal Mechanism design Type theory Process (computing) Computer configuration Configuration space output Problemorientierte Programmiersprache Quicksort Physical system
Greatest element Building Installation art Scripting language Multiplication sign Source code Home page Parameter (computer programming) Function (mathematics) Formal language Derivation (linguistics) Mechanism design Mathematics Type theory Computer configuration Semiconductor memory Different (Kate Ryan album) Phase transition Extension (kinesiology) Error message Descriptive statistics Physical system Scripting language Decision tree learning Overlay-Netz Firewall (computing) Keyboard shortcut Moment (mathematics) Bit Variable (mathematics) Derivation (linguistics) Type theory Category of being Computer configuration Phase transition Configuration space output Right angle Quicksort Arc (geometry) Point (geometry) Slide rule Functional (mathematics) Module (mathematics) Inheritance (object-oriented programming) Service (economics) Similarity (geometry) Modulare Programmierung Generic programming Field (computer science) Attribute grammar Revision control Goodness of fit Natural number Intrusion detection system Energy level Software testing Configuration space Module (mathematics) Default (computer science) Home page Standard deviation Wechselseitige Information Key (cryptography) Poisson-Klammer Code Generic programming Field (computer science) Performance appraisal Graphical user interface Subject indexing Integrated development environment Algebraic closure Query language Revision control Gastropod shell Abstraction
all right so first up is the it's the man who doesn't need an introduction but gets one anyway so first up we've got we've got echo and for those who maybe don't know yet because I did see a couple of ends of people attending an ex-con for the first time echo is the person that we have to thank for for for the initiation of nick's basically because he worked on that during his PhD research and today is however not the time to look back but today echo is gonna tell us about what lies ahead for Nix the road map of the future of leaks basically right so please give them a hand echo thank you so I guess this work can you hear me okay great okay so first of all thanks to the organizers and apart from that let me say that's a talk about external monitors being hard to configure in a NIC so as is fake news I just plugged it in and it works its magic yeah so this this talk is was originally called a mix roadmap but there is no Nick's roadmap it's towards or Nick's roadmap because this should really be a community effort so this is sort of a starting point towards a roadmap so people have been saying for years that we should have a road map so lately I've been doing a lot of roughs programming I've been really drinking the rushed kool-aid so whenever I have a problem now I ask myself what would rust to do so it turns out rust a they have an RFC for everything they have a beautiful process for everything so yeah so why should you have a roadmap to begin with well so they answer that so it's so the main thing is that it allows the world to see what are sort of the long-term plans and destroyed strategic priorities and it allows all the developers to hopefully get behind that and so everybody is hopefully so hopefully then everybody is kind of pointing in the same direction and another thing they mentioned is that so they have a rapid release cycle but it turns out that that kind of men stats sort of bigger features as sort of long-term projects were sort of falling by the wayside because they don't really fit into that rapid release cycle so establishing annual goals as though in 2018 where we're going to make the compiler fast that's a that's a way to really get people behind it and ensure that they spend time on that so that at the end of the year they don't have to say we fail to reach our goals so the process that they have and I'm not necessarily saying that we should follow that but so they have an annual roadmap and so they have a process for for creating that roadmap so they have they write a an RFC where they get our problems that the community has and then from that they extract a bunch of goals so that sounds like a reasonable thing to do and and and then that they have a whole plan for the year so in say February they start planning how to reach those goals and then they start implementing them so I'm not saying that that necessarily makes a lot of sense for us but at least having a roadmap for say 2019 saying these are the goals that we want fornix that that sounds like a very valuable thing to have so in the rest of the talk I'm just going to do a brain dump of some things that I think are problems with Nicks and and and from that it follows that there are some goals that that we should implement but this is just sort of my IDs and so so I would like to do kind of this RFC process and get everybody's inputs and from that hopefully we can get a roadmap for 2019 okay all right
so some problem statements so so these are just some things that are currently problematic with Nick so for example or things that I would like to do but I can't at the moment so one is I would like to use Nick's as a make replacement or a basil replacement so and Nick's is all these nice features a purely functional language reproducible builds isolation and we do this for sort of large things like packages and for very small things like configuration files in Nick's OS so it seems like it should be a perfect bill tool as well so for building things like see source files or whatever other language you want to build but there are a bunch of reasons that currently you can't really do that at the moment so I'll come back to that so that's one problem another Nick's package options are not easily discoverable or configurable so right now Nick's packages have all sorts of options so for example a package function might have an enable foo arguments but this is completely undiscoverable except by reading the next packages source code and and it's also not configurable via an X and for any other tool so this is not very good UX another problem is that this is and this is an increasingly big problem is that Nick's packages and Nick's Wes evaluation is slow and it's getting slower all the time because the the sort of the level of abstractions that are used in the expected ease are increasing and it uses a lot of RAM so and actually it turns out that this problem is kind of related to the preview problem another one that comes up a lot is that Knicks currently has no way to handle secrets so things like passwords or keys you don't want to store it I was in the Knicks store because then their world readable which is bad so you need to some way to to deal with them or so another problem as an unprivileged so right now if you want to pull something from the binary cash from an arbitrary binary cash so you cannot do that as an unprivileged user how things need to be signed by a a key configured by the administrator so if you just want to pull some something from some arbitrary cash you cannot do that as an arbitrary user another one so closure bloat so this is a fairly big issue so in Knicks it's very easy to end up with a package that has way more runtime dependencies then it's then it actually needs also give an example of that later so yeah
these are just some random problems so probably many of you have other problems so we interested to hear them but so
here are some goals just give something yeah so here are some goals that that you can extract from those problem statements so these are more at a technical level so by the way if you look at the the rust goals there also a lot of non-technical goals there like improve documentation eventual eyes in certain communities make the community more diverse so that's all great as well yeah but but here I'm more focusing on on technical stuff but we should definitely not restrict ourselves to that's in the roadmap yeah so make NICs a compelling build tools or compelling replacement for NIC's or basil of make or basil or something that can actually compliment those tools make the next or content-addressable so that's a very technical goal but it's it's kind of related to all the others making expected is discoverable and configurable improve the evaluation efficiency provide mechanisms to prevent closure bloat prevent provide a way to store secrets in the Nick store so yeah these are just some goals that I would like to work on in the next year and to some extent I have been working on them so the rest of this talk is just some random brain dump on on how these goals could be achieved so yeah so first the gold Nick's as a build tool so so what do we need to get to that so in a way you can already do this in fact you could do that ten years ago in fact there is a Nick's make repository somewhere which has a bunch of functions for building C or C++ projects and and that works fine but the problem is so now you have your project and you're using these Nick's make functions so you can run Nick's build to have incremental builds for your project and that's so great so now you want to package this thing and put it in the expect juice so you make a tarball containing your source code and your nick's expression and now you want to write an X expression in X packages that extracts this tar ball and builds it and there you run into the problem that you need to be able to call Nick's from inside an X build because you're using an X expression to to build your project so instead of a make file so previously you would call make right but now you have an annex expression that builds your project so you need to be able to call Nick's built but you're inside a Nick's build already and so so and then and Nick's derivation doesn't actually have arbitrary right access to the next door in fact it only has write access to its outputs so so this doesn't work so now you have a very embarrassing situation so you have a a package that's written that has a build system written in Nick's but you can't actually put it in the next packages you could put it in Debian probably but you can't put it in the expected system so so this is not good so you need need recursive Nick's so that's kind of a required feature another not essential but very nice to have features content address ability which I'll come back to and caching of copying files to the store so a tool like so if you have your project which might consist of thousands of source files so now every time you run Nick's built it has to read all those source files and copy them to the Nick store or at least check whether they already are in the next door so that's a lot of i/o and it's slow so a tool like make prevent avoids that by only checking timestamps and even that can get slow for very large projects but yeah Nick's needs to hash all these files so it actually needs to read all of them so you you want to have some kind of caching for that and maybe in something like an I notify daemon to efficiently notice when files change but this is in nice-to-haves category right so the content addressability this is has kind of been uh could say a Holy Grail for many years so this is the property that should step back for a second so if you remember so a Knicks store bath contains a cryptographic hash but that cryptographic hash is a hash of the derivation that built that path it's not actually a hash of the content of that path and and and this is why you need signatures on binary caches because you need to trust that some store path was actually produced by the derivation that it claims to be built by and that could be a lie so somebody could set up a binary cache where so you you you you get a legitimate NICs expression so that's for example builds Firefox and then you pull a binder from the cache that actually contains something completely different like a Trojan version of Firefox so that's why you need signatures so in a content reversible store the store path so the
hash in the store path is actually a hash of the contents of that path so you no longer need to trust anything so you can just verify that for example in a path like this Nick stores cash you just check that the cryptographic hash of the contents of this path is this so a path basically contains its own proof of integrity so if you have this then yeah unprivileged users can install things from arbitrary binary caches another very big advantage is that you get deduplication so for example if you say you make an irrelevant change to something in the dependency graph like you you make a whitespace change to G Lipsy so currently that causes the entire system to be rebuilt which is bad and and actually yeah not just rebuilt but duplicated in the nick store so you need twice the storage space now whereas with a content-addressable store so because this change is irrelevant it doesn't actually change anything to the output of a build it ends up being stored in the same location so that's that's much nicer and in fact it it does prevent rebuilds because if for instance so you make that change to GFC you still need to recompile gilepsy to discover that that change doesn't matter but after that you don't need to rebuild anything that depends on it because you've already discovered that yeah this GFC is actually the same so so so so it sort of acts as a barrier in the in the dependency graph yeah so this is why content addressability would be a great feature to have there's a few interesting things about content addressability it to make this work properly it really needs perfect binary reproducibility so that's currently not the case with Nix packages so if you build a package twice you might actually end up with slightly different results so for example if a binary stores a timestamp somewhere but yeah a lot of people are at work to improve that so for example there's a whole reproducible builds or ik project that's that's yeah basically improving all sorts of packages and build systems to to eliminate sources of binary impurity okay so for the other thing so yeah prevent preventing closure bloat so this is something of an obsession of mine sorry about it so NYX because of the way it detects it finds dependencies it's very easy to have an accidental dependency so this is not the case in say Debian where you specify the dependency so you don't end up with an accidental runtime dependency and say the C compiler so here for example there was a situation where Thunderbird was storing its build configuration so you can do about config in the URL bar and it will show you the path to the C compiler used to compile it which is of course kind of a useless thing but it does add twelve hundred megabytes of blow to the closure and yeah so this can happen very accidentally you don't get any errors if you do that so yeah we need better tools to - yeah detect when this happens so we already have some attributes that you can specify in in Nick's expressions for example you can say disallowed requisites to say that something should not have a runtime reference to say the C compiler but this is very limited for example you cannot do any pattern matching and you would like to say this thing should not have any references to developer outputs and it should be per output because for example your death output probably should be allowed to have references to other def outputs and and you might want to have a size check so for example if say the Nexus ISO suddenly gets a gigabyte bigger than we would like to get some error so what I recently implemented partially not all of this works yet is that you can specify per output check so for example you can say the outputs the out output should not have a closure bigger than 256 megabytes it should not be should not reference the C compiler or any def output but the DEF output itself can reference anything but it should not be larger than 128 kilobytes as a random example so yeah so we can start putting these things in NYX packages it could even be a generic thing so for example the rule that things should not allow should not be allowed to reference def outputs is something you could actually put in standard and so as a general policy so yeah that would be very nice am i doing on time actually I can just
check okay Wow right so yeah now I come to the really wild and paper where part of to talk so so yeah really a big issue is discoverability and efficiency like I mentioned so NYX packages have basically no discoverability well I mean you can discover that they exist sometimes I mean nixon's doesn't necessarily recurse into everything but so you can see that packages exist we can see what options they have and customizing packages is also very ad hoc it sort of evolved it's not really a properly designed thing so if these things like dot overrides dot over I derivation config and and in fact this this whole dot override thing is kind of disastrous for for performance so it's really one of them it's one of the two main reasons why Nicks evaluation takes a lot of memory so so dot overrides basically destroys the ability of the garbage collector at runtime to actually collect any garbage because you call a function so you pass it some arguments which can be very big because there are arbitrary dependencies or large graph and and then the output just contains the inputs so the inputs can never be garbage collected so this is this was yeah kind of a bad idea but we don't really have anything better so it's kind of a meta issue here of all these things so somewhere along the way we forgot that NYX is intended as a domain-specific language for specifying build graphs and configurations like like knixwear systems but as a DSL it's not really doing a great job so for example it has no concept of a package or an option or a configuration or things like plugins so any sort of feature issue or concepts you might expect in a DSL intended for doing these things so so maybe we need to get back to what we need to improve NICs as a DSL so so one thing that I've been thinking about is so essentially turning the NIC Soyuz module system or an improved version of it into a language feature into something called a configuration which you can really think of as an attribute set an extensible attribute set which is really what an XS configuration is it's it's a bunch of attributes that you can change and so if you change one attribute it can trigger other attributes to change yeah so so a configuration is attribute set which contains attributes called options that can be set and they can be overwritten later but unlike attribute set and like mixes options they can have types and documentation and merge functions and and that's the thing that gives you discoverability so things like package options can be expressed in this way and because they have things like a description and a type tools goods can discover them and present them to the user and then allow them to be changed program act programmatically
so so the sketchy design for this language feature is a bit like this so a configuration looks a lot like an attribute set only it uses angled brackets my change don't get too angry or NSTIC about the syntax but yeah so it's a you could think of it as an attribute set so we have an attribute foo natural bar and an attribute ABC that actually refers to foo and bar so it's a recursive it's like a rec attribute set so if from this thing you select the ABC attribute you would get a value 1 2 3 because bar is true so if if true then 1 2 3 right but what you can now do is take that configuration module and apply an a new module to it that sets bar to false and so now if from this module you select ABC then it will return a false then it goes to 1 2 3 times 2 so it will return to 4 6 so this is pretty much exactly the behavior of the mixers module system yeah and then the idea is that you can have some sort of syntax which I'm not sure about but to attach fields or annotations to those to those options like documentation a type default value and so on merge functions priorities yeah all that sort of thing Oh basically all the finish within the mixers module system right but now the idea is that we can apply this to building packages so rather than having packages as functions which have the problem that well there's no override mechanism I mean no good override mechanism there's no documentation and so on we can basically treat packages in the same way as the Nexus module system treats system configurations so you you build a package in sort of a modular way by combining a bunch of modules so for example you could have a very sort of at the bottom you have a module that captures the concept of a derivation so what is a derivation well derivation niks as a name and a version and it has a builder and it's as arguments and it as an environment and if you set those things then you can evaluate a derivation attribute which produces a low level derivation so so this is very low level but now you can build higher level modules on top of that so for example this thing basically expresses the basic standard environment so the concept of phases and dependencies on our packages so this thing adds a an option called built inputs and an option called phases and it's it implements this on top of the lower level derivation module by setting builder and arcs and NF and that's causes derivation to be computed yeah that uses these things so just to continue this a bit so you could have a module that captures the concept of a package or package has a description and the home page and so on and all these things have had descriptions and they have types so they're they're discoverable and and you get error message if so for example if if you use this previous so currently with derivation Ziff for example you you misspelled build inputs you're not going to get an error because nix derivation is basically just a bunch of environments variable bindings so yeah so there's no checking whatsoever there but here if you set an an option that hasn't been declared then you get an error message it's just like in the in the module system yeah so you can build higher and higher level abstractions on this so for example you can extend the sort of generic standard environment with the concept of a UNIX package which for example has a configure face and because it runs a configure script yeah and then finally you can define a package so a package is something that extends the UNIX package module with something that sets a name description source but it also has its own option namely enable GUI so in this fictional example a hello world has a gtk support so now you can say built-in Putz's if enabled GUI then use gtk and and this thing is now discoverable so you could have say a next query package command which will show that this thing has an enable GUI option and you could have an ex install command that that
allows you to set that option so yeah and and you could overwrite things using the exact same module system so that's just like in Nix OS so that's about it so there are lots of other things you could imagine for the roadmap I'll skip that so yeah so what I should do is create a sort of a roadmap issue and where everybody can go wild with IDs and suggestions and then we should try to synthesize something workable from that yeah that's it [Applause] so for the configurations idea use it doing overrides how does that solve the memory issue don't you still need to hold on to the references it's all the inputs no because as soon as you evaluates the dot DRV attribute you don't need anything else anymore you can add that after that you can just discard everything that went into it so that's like an express if you evaluate system dot bill dot top-level you get a derivation out of that and a and at that point you can garbage collects all the inputs to that thing so like the when when you're passing something as an input to something else you're passing in the dot DRV not the composable right so presumably this is all fiction so so here this gtk thing would trigger an evaluation of gtk DRV implicitly on the slide about the Knicks is a bill to you have been talking about avoiding rebuilds and similar properties have you looked in to the recent paper built systems a la carte by new mutual Simon Johnson Andrey Markov they're analyzing variable system there and it is one of them and Knicks tix of almost all features of the Ultima build system was that the ICF P paper yes okay it's okay please take a look at this it gives very good names a good glossary to talk about properties it's a great paper thank you so the configuration options it excites the type system in me is the also so the NIC service module system also supports the notion of overlays this is their version of this also in this or is this extension yeah probably so that sort of a high-level thing that I haven't figured out yet so how he actually put these things together so I but you need some way to do that and yeah so I don't know yet okay yeah I think we've got to move maybe one more last one it's more of a policy question I've seen a lot of commits like fixing Thunderbird like you showed and there has been committed for every separate packages why don't you enforce this kind of like do not reference GCC for all the packages by default and then if some package really needs GCC then you could enable it right well that's kind of the point so we don't have a way really to enforce that yet so right now it's really only if sort of notice that suddenly a closure has become much bigger so you can use these disallowed requisites attributes but very few packages do that but you could use that by default like under 2m key derivation and then by default just do not difference GCC there is no reason for most packages to do so right and so probably for GCC that would work but you really want to say things like it should not reference any death output and and that doesn't work at the moment because death output should be allowed to reference half output so you can't use the existing attributes for that and but yeah so I would definitely like to have an index West for example that we say that all the Knicks OS VM tests could just check that their closures don't have any def outputs in them and and now GCC or clang so that that would already help quite a bit ok already so thank you again for your talk [Applause]
Feedback