A typesystem for Nix
Formal Metadata
Title 
A typesystem for Nix

Title of Series  
Author 

License 
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. 
Identifiers 

Publisher 

Release Date 
2017

Language 
English

Production Year 
2017

Content Metadata
Subject Area  
Abstract 
We present here the new tix tool, a typechecker for the Nix language whose goal is to make nix code easier to write and maintain. Nix's strength comes from the ideas it borrows from functional programming: immutability, powerful abstraction capabilities, etc.., yet it misses one of the most emblematic and powerful feature of most functional programming languages: a strong static typesystem, which becomes a problem given the increasing size of the Nix package collection. This proposal (which is the result of an internship funded by the Nix community and tweag I/O and supervised by Guiseppe Castagna) tries to fix this. The resulting typesystem and typechecker have been designed with two very important constraints in mind: The typesystem must be able to offer as much safety as possible for the programmer. One must be able to reuse existing nix code with as few modifications as possible. This means that: Most of Nix idioms have to be supported by the typesystem − in particular the presence of types at runtime and the dynamic fields in the records, There must be ways to bypass − up to some extent − the typesystem, using e.g. gradual typing or by locally disabling certain errors. In this talk, we will present the tool from the practical point of view of the nix user and discuss further improvements.

00:00
Functional programming
Code
Online help
Parameter (computer programming)
Field (computer science)
Computer programming
Software maintenance
Element (mathematics)
Pauli exclusion principle
String (computer science)
Functional programming
Extension (kinesiology)
Physical system
Programming language
Validity (statistics)
Electronic mailing list
Plastikkarte
Line (geometry)
Type theory
Data management
Film editing
Error message
String (computer science)
Right angle
Summierbarkeit
Resultant
03:11
Code
Online help
Parameter (computer programming)
Theory
Power (physics)
Type theory
Bit rate
Causality
Integer
Functional programming
Extension (kinesiology)
Booting
Physical system
Boolean algebra
Source code
Programming language
Validity (statistics)
Computer file
Code
Plastikkarte
Special unitary group
Line (geometry)
Type theory
Number
Film editing
Rewriting
Compilation album
Block (periodic table)
Extension (kinesiology)
06:07
Regulärer Ausdruck <Textverarbeitung>
Computer file
Interior (topology)
Multiplication sign
Syntaxbaum
Branch (computer science)
Parameter (computer programming)
Shape (magazine)
Mereology
Field (computer science)
Element (mathematics)
2 (number)
Programmer (hardware)
Inference
Type theory
Singleprecision floatingpoint format
String (computer science)
Negative number
Integer
Arrow of time
Functional programming
Lambda calculus
Physical system
Condition number
Boolean algebra
Algorithm
Constraint (mathematics)
Theory of relativity
Expression
Line (geometry)
Cartesian coordinate system
Variable (mathematics)
System call
Symbol table
Abstract syntax tree
Type theory
Subject indexing
Integrated development environment
Personal digital assistant
Network topology
Inference
Typprüfung
Video game
Musical ensemble
Freeware
Resultant
Reading (process)
Row (database)
16:15
Interior (topology)
Code
State of matter
Mountain pass
Equaliser (mathematics)
Multiplication sign
Range (statistics)
Set (mathematics)
Parameter (computer programming)
Shape (magazine)
Function (mathematics)
Mereology
Array data structure
Casting (performing arts)
Type theory
Computer configuration
Different (Kate Ryan album)
Set (mathematics)
Arrow of time
Aerodynamics
Functional programming
Error message
Lambda calculus
Identity management
Physical system
Intelligent Network
Programming language
Software developer
Electronic mailing list
Sound effect
Attribute grammar
Bit
Instance (computer science)
Regulärer Ausdruck <Textverarbeitung>
Sequence
Flow separation
Parsing
Type theory
Message passing
Typprüfung
Configuration space
Right angle
Physical system
Resultant
Writing
Row (database)
Asynchronous Transfer Mode
Asynchronous Transfer Mode
Implementation
Online help
Electronic mailing list
Regular graph
Coprocessor
Field (computer science)
Element (mathematics)
2 (number)
Attribute grammar
Revision control
String (computer science)
Integer
Data structure
Normal (geometry)
Implementation
Default (computer science)
Boolean algebra
Module (mathematics)
Default (computer science)
Information
Tesselation
Expression
Code
Plastikkarte
Cartesian coordinate system
Performance appraisal
Film editing
Error message
Personal digital assistant
String (computer science)
Function (mathematics)
Mixed reality
Table (information)
Library (computing)
33:07
Polymorphism (materials science)
Ocean current
Computer file
Code
Direction (geometry)
Equaliser (mathematics)
Multiplication sign
Range (statistics)
Coprocessor
Field (computer science)
2 (number)
Mathematics
Term (mathematics)
Different (Kate Ryan album)
String (computer science)
Integer
Randomized algorithm
Functional programming
Immersion (album)
Identity management
Physical system
Programming language
Expression
Bound state
Infinity
Basis <Mathematik>
Cartesian coordinate system
Measurement
Abstract syntax tree
Performance appraisal
Type theory
Film editing
Personal digital assistant
Moment <Mathematik>
Right angle
Data type
Lambda calculus
Row (database)
00:00
all right guys last talk for today this one is by French wits who contacted me just nine months ago versus me that and he wanted to do an internship at the Untied because he wants to work on next and then we organized a fundraiser or really surprised by the generosity of the community and now he's gonna talk to you about what he really wanted to do which is type in Nick's okay thank you so this is indeed the result of the internship I did with the help of Jonas and IO so the goal of this internship was to design a type system phonics so the first obvious question is why do we want to do that so there are two reasons for that the first one is that well before being that awesome  that everyone here uses Nick's was a nice experiment well the idea was where there are really cool stuff in programming languages what if we use it somewhere else like package management and so Nick's imported stuff like functional programming notability and so on and it worked very well but there's something that Nick didn't brought in which is quite pervasive in programming which is static typing so it should be a natural extension to try to add typing to this and see where it leads and there's also an or pragmatic reason which is that when you expect edgy stuff to be quite yeah so I'm usually around 1 million lines of code and when you have a cut bait that big you got Arab and they can be rather hard to spot and that's where typing can help a lot and so if typing is so useful the second question is why didn't anybody design a type distemper onyx the reason the main reason I think is that it's not totally obvious to do if you look for example as dysfunction so it's totally valid NYX card you take a function we take an argument you look at the first elements of the arguments which probably is supposed to be a list if the first one is a string you try to access to the field of the second one which is the value of this string otherwise you do the sum of birds well that's my pep example I hope nobody's writing things like that but you can do that in Nicks and if you try to write a type of this function even thy hand use stop becoming mad and now that well so that's not totally trivial to do and the second reason it hasn't been done yet well I quickly mention it Nick's
03:14
packages are being quite big so I measured this last week you got 1 million lines of code probably 90 90 percent of it on Nick's actual Nick's code and so probably not up it is going to typecheck at once probably almost nothing at it it's going to type check so that means that the type system or the type checker must be really flexible if you want to be able to run it unmix packages without it to crash at the first line of credit rates so we try to do it the requirement for this where the
03:57
first one requirement is that we didn't want to write a new language which was well tight and which could compile to Nick's code I mean that's a valid ID but if you do that you can't use it and actually Nick's card and you that's kind of sad because well you don't want to rewrite Nick's packages in a new language no and as a consequence of it we didn't want to extend Nick's and tax in particular we if we want if we add Nick's annotation it valid Knicks guard and the two last requirements are first well obviously that we want to type as much code as possible while still being safe and the other one which is kind of weird is that even the cut that's not well type must still be acts somehow accepted by the type checker cause well as I said you probably won't get the whole of expect edges to type check whatever you type system is or maybe you can if your type system is just okay everybody type checked but that's not really useful but anyway we want to be able to say that ill typed code must be accepted by the type checker so that were our requirements and hopefully what we came up with nor let satisfy this so to satisfy the special issues showed requirements we need quite powerful type system so this is another annex function more reasonable that the first one I showed whether does it do it takes two arguments if the first one is an integer remake the Sun of birth otherwise it makes the boolean and so if you wanted IB Sri gonna be equal chuckles well it's like it has types intro intro in yeah but it's also as type Borobudur reboot and you can't really unify the books I mean how do we express the type of this well we can ask for help  back mathematic theory which is set theory if we look at set theory things
06:11
are actually quite easy you want something to pick an element which is in the set a and in the set B where life is simple you just take an element which is in the intersection of both sorry
06:26
actually well we can do almost the same
06:30
thing with types while just we type the resist we like to give new names for other things so we can rename the symbols instead of Union intersection we're going to type the bull and and for instead of set containment we're going to take subtyping relation but essentially this is the same thing so and you doing said Tori we get for free singleton types so types with only one element for example a singleton type one which has only the constant one as an element and others which are going to be quite handful and with this notion of types we can type our F function it will simply have the intersection of both type we want index to express so we're happy we can type a lot of stuff with that but that's not going to be enough because still something that we can't type I mean every single day of that shape that would be simple but that's not the case so we had another element to our type system which is a gradual typing so gradual typing is a way to to express the fact that you can type something so you add a new type which is generally a question mark which represents you the type that you don't know and we can which can be used to type expression that you're not supposed to type so in this example the value of x can be any string because it's the value of the environment variable X so when we try to access to the field X of this record well it's supposed to be to fail or except in the very special case where X is equal to Y so this is not supposed to type check but if we want this to type check because it's already there and we don't want to change it we can say the tie to the type checker okay X will have type question mark and so don't bother what it is whenever you have two musics just assume that it has the right type and it works so we're happy we have a very expressive definition for all types which can be used to represent more or less everything in expect edges because when we can't express the type we want we just say it's gradual and we don't care but of course the problem with it is that well we can't always infer the types before example we try to run out type checker on the first example well obviously you want it one sir oh it's easy it's a function from integers to integers but actually it J which will just say no it's a function that takes something I don't know what but which returns an integer if you take the second one it will also say that it accepts something but don't know why I don't know what and it will return either an int is zero but that's the cool part at least that's precise so basically what the type checker does when he sees that he says okay yeah it's a function it takes an argument that I don't know anything about this argument so I'm going to assume it's gradual and then we try to type check the body of the function and in fairytype for it so we need to help him and we can't help him we can annotate the types so this is the syntax for type annotation is just wrapped in two comments so that nick's doesn't doesn't see them and can continue to evaluate the file and here we say okay X is of type int and the type checker type check the function as it should we can do the same with the second function and it says its function that takes an integer or horribly and and returns an integer or a boolean but in fact that's not what we want if there occurs this function where this types mean is that if this function takes an integer it will return either an integer either a boolean which is true but not really useful in reality if this function takes an integer it will return an integer and it takes a boolean it will read boolean so that's not the most precise type we we are going to give to this function and we don't want to have ten precise types cause otherwise the type system is just useless so we need to help him further and to do that there's a wonderful technique called be directional typing which can give helpers giving a more precise type to this function to explain what it is I'll just quickly explain how most traditional type checking algorithm raj so let's assume you get an expression this is a syntactic tree for an expression so it's a function which takes two arguments x and y and return the express while of the plus function applied to x and y traditional type checking algorithms will type check this bottomup so first trying to infer the type of the leaves and then go up in the trees until you have the type of the whole expression so for example the algorithm first we'll check the type of plus he will say okay I know plus it's a function it's in my environment that ice type int arrow int int then we child it will check X it will check okay X it hasn't I know it's in line and viral none because it has been defined before I don't know it's typed yet let's call it TX and then he'll go one step above and you will say okay we can have this type check if we say that TX is equal to int in which case we can apply plus 2x and the result will add int or interval in so it will give us the type int arrow int and it will add somewhere is a constraint that G is equal to a then he will do the same with out with white saying oh it has typed gy then it will do the application saying okay attach X is G 1 is equal to int and the result as type int that's good then one step above okay this is the function so it will up tight pinch or in because if y is an int it will return an int and still one step above if it's going to be a function of type int int the routine okay that's cool so now let's look at the function we had before so this is its expression tree it's a function which actually it's not the function we had before it's a simpler one so it's a function which takes an argument speaks if X isn't well if isn't examine it will return minus minus X tells it will return not fix and if we try to type check this using the same method we're going to have a problem because the algorithm will say okay I got many supplied to X so X is an integer okay I got not applied to X so X is a boolean so X is at the same time a boolean and an integer no that doesn't work but let's assume that the type checker already knows the type of this expression because for example the programmer give him a type annotation he will say ok I know that this is supposed to have type int arrow int intersection blue arrow bool how can I check that well to do that first we get an intersection of arrows that mean that we're in Brussels in both rows so let's try the first one this is a function so that means that if X has type int then the body has type int okay how do I check that this has the right type this is an evening else so I'll try to infer the type of the condition what it's the type of this so well we just do like we did before our classic inference we know the type of the easy int function it's a function that returns the singleton type true if it's argument is of type int and which returns a singleton type false if it's argument is of type not it's a little stuff like is boolean negation we also know the type of things because we decided that it will be an int so we know the type of this its subtype true the single turn type true okay we can continue we're going to check that Magnus sticks as type int well that's trivial we know that X has type int  X has type int and now we're going to check that no if you're not going to check that not success type in because we know that this the condition is always going to be true so we don't have a check for the second branch because it's never going to be reached so we are done we've checked that this function
16:16
has the right type now we do it the same with the second with the second type in the intersection we support that this is a blah blah blah blah it's exactly the same thing and at the end we're happy with type check this function and we can indeed run it and an example and the type checker will be happy and say yeah it's interest intersection bull our bull so we are really happy we now got powerful system which care whose types can express a lot of things and which can in fair of check a lot of things too in fact the be directional typing can give us even more because if you look at this expression though so what do you have here we got a function that takes an argument X of type int and that returns the identity function applied to X now if we try to annotate it saying that the argument is supposed to be of type int and the return is supposed to be of type bool well you would expect the type checker to complain that it's false because when you apply the identity function to an inch you want to get an integer result but in fact it's gonna pass the relevant produce is that the y : y function which is the identity function will be tight as question mark over question mark called well it's not annotated the type checker can guess its type so by default it's going to type it a jewelry so X is an inch it can be passed as an argument for to a function which expects anything and the result of this application is going to be of type gradual and gradual can be passed where you expect a bull so it's fine now if we rewrite this you're doing the directional typing so we give the type of the function to the type checker then it's going to fail because the type checker so we will do at the beginning the same thing assume that X is an int trying to give a type to the body but inside of inferring it is going to try to check that the body has type bool and to do that well he knows that X has type int so is going to check that the identity function has type int arrow bool which means checking that if Y is an int then Y is a bull and well but he's able to get that it's wrong so thanks to this bidirectional typing stuff we get more precision and well that may not seem like a big deal but the really nice thing is that if your types if you do not annotate your code well the type checker will be very relaxed and he will say okay this is not annotated let's let's say that everything is gradually typed and more or less every cut you could imagine writings going to pass but if you somewhere put an annotation then the type checker will have some some type information that we we have that he will be able to propagate down in the in the expressions and if you have an actual type here or there it will be have enough information to catch it so that makes a natural way of separating code that you want to type from cut that you don't want to type and it's probably very practical in practice so hopefully we satisfied our requirements for simplified mixed language now anacs has mastered in particular or lists and arrays so we'll have to type these two for lists well the problem with Nik style lists is that they can have elements of different types in it which is which can be useful in practice that's why it is that which not that easy to type well in fact when you want to represent to describe some checks of text there's a powerful tool which is regular expressions well Nicholas our regular expressions can be wonderful and in fact you can have the same with types so you can type a list using your regular expressions that represents its element so for example the first list here when to true can have the type instance or any sequence of int followed by the singleton type true followed by the optional the shop here is what's usually a question mark in reg X is that question mark is already taken by the gradual type so followed by the optional bar string so this first list will have this type the second list is another element of this type also it's quite different indeed it has a sequence of entering this case 0 ins followed by true followed by the optional bar which is in this case present this general from probably not useful or not often useful because you don't want to write to have a function that accept any Lisa has this shape I don't know what you well maybe you have I did what you would want to do with this but I don't but there are so cool stuff is that first you can express the regular monomorphic list using this the list of whose elements will be applied a with just be a list of type a star and or two NYX has no tepals you can express step assuming this for example the list of type int bull the type list in bull will be the type of or list with two elements the first being an intent the second a bull so this is almost the same as the table int will which can be handy if you want to use step L and have them well typed okay so a lists raw fairly well no let's look at attribute set and there are several ways attribute sets are used in undies to that I know of the first one is what I call static attribute set so that when you know everything about your library so in this case for example you know that this attribute set will have three fields x y&z we know the names we know everything so this has a trivial type which is itself but as a type so that's really easy to do you'll be happy no we can be a little bit more fancy let's look at this function so it takes an argument and attribute set which must have an element X may have an element Y may have anything else well we also can touch it how we would we type it well we will say that it's argument must be a record with the first element X which is an int the second element Y MIT which may be present or not such what Z equal question mark means and if it's present it must be an int and which may have anything else after that so we can express this kind of thing now in mixed we can do some much rarer stuff so for example let's look at this attribute set it has probably two fields but we don't when their labels are we we know that are going to be strings otherwise it will then be valid but we don't really know so all we can sell you that it's going to be a recall where if you try to
24:41
access a field well maybe it's going to be absent or maybe it's going to be present in which case it will be either 1 or true so that's all you can say about this record and we're quite happy because we can express this in this type system so we can extend our type system to manage these data structures and we are really happy about this the problem is that well I showed you the example with X : X function which is in fact an inside cast you can cast any value to any type without any problem the type checker will say okay I okay no problem I close my eyes do it and well sometimes it's way cool it's practical when you cut the when you don't want to bother about the typing but most of the time you don't want that so you want to be able to tell the type checker ok you could just a little bit more strict so that this doesn't pass it will be very nice so for this we added a strict mode to the type checker so this is basically a simpler version of the example I showed it's going to type check now we can save to the type checker ok this is going to be type checking strict mode and in this case it's not going to pass the type checker is going to say ok a your the function x : x I don't know anything about it argument let's say it's of type any so any is the type I don't know anything about it so the result is going to be of type any that's any other type I don't know anything about so in particular I don't know if it's a bool or not so I'm just going to fail and we're happy because that's what we wanted and if you want and that means that if we want that's kind of thing to typecheck well no not that not that kind of thing but if we want to be able to use function in an effective way we'll have to annotate them and we can just let unsafe checks passed without noticing another feature of the street mode is that well if you remember the examples that I showed you about the record of this one which is very much the same well this type checks which will say that it's a record which we don't know about the label and so on but if x and y in fact are the same thing well evaluate to the same values this is not going to be accepted by the evaluator he's going to say oh no I got a record with where the same field is defined twice this is not valid and so on so in strict mode we also want to disable this and in fact if the type checker is not able to infer that all the fields will be distinct his nut is just going to yell at you and output an arrow so like why that not something we want by default because you make an encounter base and sometimes you want to write this that if you in a play that you control you may want to avoid this now we can even go a little bit further so this is already the same example a really cool example actually I can use it everywhere it's really handy so these type checks now we can say to the type checker okay I want to disable the special effect of the gradual type here so it's going to raise an error because cast will still be of type question mark or a question mark but now when we get this no gradual annotation has a question mark is going to lose its special features and just become a regular type like int like boo like string and so in particular an int is not going to be a gradual type the gradual type is not going to be a boolean and so this is going to fail and of course as we want to use even if we disable the gradual type we still have to use functions from the outside world which sometimes are gradually typing we will had couple of functions which allow you to explicitly insert the implicit casts that were added by the gradual type so if you want to tile check this you still can well it's not I wouldn't advise you to do that but it's false but you can do it and you can use functions from the untied world that are coming to use separate ID module with no gradual type so the nice thing about it is that you can add a write right range of strictness going from the default mode where almost every card is going to type check then you can add some annotations to restrict the immune there is the amount of cards that type checks and you can add some strict or on the gradual annotations to get some really strict cards that well if it type checks you're quite confident that it's going to run without zeros so currently that's more or less all there's a and implementation that I wrote from my internship in a KML almost all the type system is implemented but well I wrote the processor very quickly it's probably really big not probably I know it's bagillion it works on example but that's all and I've started a new rewrite in Haskell to use a Shh sneaks to avoid having to write my own parser because that's not the kind of stuff that I enjoy and but now that my internship is over I don't have the possibility to work on this fulltime anymore so I'm gonna need some help and that's where you could help by participating to the development of this and by testing it while it's going to be in as soon as it's in a state where we stay stable because the development is just at the beginning so you're all Romney welcome to come and help for the development of the implementation thank you well first of all I'd like to say awesome work I see some very complicated challenges in indeed in the next type system have you thought about the actual error messages themselves do you have examples or is that not written yet well there are error messages that they are not really interesting no no this is some part I have been really time to work on because I have to focus on the purely typing parts but that's a really important part I agree and yeah we also have a lot of fraud especially since the type system is somewhat complicated if we got badly wrong messages it's going to be young users great work I were when experiments of using new towel language with Knicks like heavy hurdles I think its language for configurations with dependent chimes by Gabriel Gonzalez yeah doll well it's it's an interesting experiment too but quite different because well one of my first requirements was explicitly not to write in your language that will compile tunics because well nobody would like to rewrite the whole mix packages in valve especially since is that possible for
33:08
technical reason but that is also interesting if you want to write your own private stuff they just don't have the same application field I think so if I if I write next code so the primary thing that I'm actually missing in terms of type systems is to make differences between things that are otherwise come considered equal something like new type and Haskell so you took the gradual typing approach here do you do you have any vision on like how depth could ever be extended with something like user defined data types the problem with user defined data types on you type if that it's quite hard to do without extending the language itself so that's why I didn't work on it at all it would be a great improvement that probably require changes in the language that we're out of scrap probably did you work on any polymorphic types yet hey that's the question I don't want to hear there's nobody immersive em yet especially because it doesn't works well with the gradual typing stuff actually the set theoretic types which serve as the basis for all the rest have been recently extended with polymorphism so hopefully we'll be able to work it out and add polymer freedom to this but that's not done yet and I I don't know if it's gonna be possible anytime soon well you still have some bounded polymorphism coil you can say that your identity function as type into range intersection bull horrible intersection string or string but you can have infinite freedom or freedom just more or less not in the general case so we could have some hacks or but we could check it in some cases that after that it's used it's not easy to use if you have a polymorphic function you don't always know how you're going to instantiate your types when you try to apply it so well it's still a research topic what does your type checker say about a very awesome function lambda X if X less than 5 then X else lambda Y X plus y actually if you gave him the right annotations he's gonna say really cool things called one night things with set theoretic types that I didn't mention if that you have singleton types but you can also have arbitrary in all types so you can have the type for all integers that are greater than tithe for example so I don't remember exactly what your function is that yes so it's going to type it as they set the intervals of integer that are greater than 5 per row in I remember correctly intersection all the other integers or or something or or something but I think it can type check it quite effectively but I also think you don't want to write that function in practice alright last question maybe Elko wants to say all right I think the last very simple question what is the random complexity of the current typing system that you have is it just linear it's mostly linear or because we're just training through the ice to the syntactic tree and we are not trying to reconstruct anything so it can be well it's not exactly linear because if you have intersection types you can have to check many time the same cut path and probably if you write complex enough types it can blow up but well at most I'd say it's quadratic I didn't exactly check it but yeah and just in the same direction do you have did you already made it and did you already make any kind of performance evaluations whether it can very quickly type some an expression that is as large as next packages or is this something that would still have to be done and I tried to do some some time measurements the problem is that my processor that I wrote probably with my feet or I don't know how but it was itself so slow that only passing the file took several seconds and was longer than the typechecking itself so no and I forgot the most important part of this torque which was to say thank you to everyone who donated to the crowdfunding campaign because it's really helped a lot thank you [Applause]