Building pnpm2nix: yet another npm to nix tool

Video thumbnail (Frame 0) Video thumbnail (Frame 1072) Video thumbnail (Frame 1724) Video thumbnail (Frame 8145) Video thumbnail (Frame 15054) Video thumbnail (Frame 24069) Video thumbnail (Frame 25537) Video thumbnail (Frame 26082) Video thumbnail (Frame 27467)
Video in TIB AV-Portal: Building pnpm2nix: yet another npm to nix tool

Formal Metadata

Building pnpm2nix: yet another npm to nix tool
Title of Series
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
There are already plenty of node package managers so why yet another one? This talk aims to give you an overview of the current npm-to-nix ecosystem, what tools there are, what the strengths/shortcomings of each one have and what pnpm is all about and how it pertains to nix. After that I will be talking you through what goes in to building pnpm2nix, what problems I had along the way and how nix/nixpkgs could improve to help such efforts along. --- Bio: I love building software
Boiling point Information security
Laptop Code generation Functional (mathematics) Module (mathematics) INTEGRAL Code Multiplication sign Directory service Price index Modulare Programmierung Formal language Product (business) Coefficient of determination Latent heat Different (Kate Ryan album) Term (mathematics) Computer configuration Energy level Data structure Implementation Curvature Data integrity Module (mathematics) Graph (mathematics) Constructor (object-oriented programming) Independence (probability theory) Sound effect Attribute grammar Bit Line (geometry) Directory service Demoscene Type theory Data management Arithmetic mean Process (computing) Computer configuration Integrated development environment Network topology Modul <Datentyp> Window Local ring
NP-hard Run time (program lifecycle phase) Hoax INTEGRAL Code Graph (mathematics) Multiplication sign Modal logic Source code Combinational logic 1 (number) Price index Function (mathematics) Derivation (linguistics) Type theory File system Circle Recursion Logic gate Physical system Exception handling Data integrity Source code Meta element Link (knot theory) Electric generator Touchscreen Block (periodic table) File format Kolmogorov complexity Software developer Computer file Point (geometry) Data storage device Metadata Determinism Variable (mathematics) Windows Registry Demoscene Type theory Order (biology) MiniDisc Block (periodic table) Point (geometry) Windows Registry Code generation Computer file Image resolution Expert system Binary file Rule of inference Code Metadata 2 (number) Attribute grammar Number Revision control Latent heat Term (mathematics) Energy level Data structure Plug-in (computing) Module (mathematics) Multiplication Graph (mathematics) Matching (graph theory) Run time (program lifecycle phase) Projective plane Graph (mathematics) Directory service Binary file Cartesian coordinate system Peer-to-peer Invariant (mathematics) Integrated development environment Personal digital assistant Function (mathematics) Network topology Mixed reality Interpreter (computing) Exception handling
Functional (mathematics) Module (mathematics) Graph (mathematics) Projective plane Internet service provider Bit Ripping Function (mathematics) Attribute grammar Medical imaging Mechanism design Hooking Internetworking Function (mathematics) Single-precision floating-point format Revision control Object (grammar) Computing platform Force
Derivation (linguistics) Graph (mathematics) State of matter Computer file Hash function Projective plane output Number
Suite (music) Building Presentation of a group Run time (program lifecycle phase) Multiplication sign 1 (number) Function (mathematics) Mereology Software bug Formal language Word Derivation (linguistics) Mathematics Algebraic closure Data conversion Recursion Physical system Scripting language Electric generator Arm Binary code Solid geometry Bit Repository (publishing) Telecommunication Phase transition Interface (computing) output Software testing Right angle Cycle (graph theory) Physical system Point (geometry) Functional (mathematics) Identifiability Real number Image resolution Modulare Programmierung Repetition Product (business) Number Term (mathematics) Reduction of order Boundary value problem Software testing Codierung <Programmierung> Plug-in (computing) Installation art Module (mathematics) Graph (mathematics) Suite (music) Run time (program lifecycle phase) Interface (computing) Expression State of matter Cartesian coordinate system Subject indexing Algebraic closure Personal digital assistant Network topology
okay so next up is Adam and Adam is going to tell us about yet another NPM to next tool and I'm sure that by a the end of the talk we're going to be convinced that we need yet another one so give it up for Adam so I'd like to just start by introducing myself I joined mixer West back in 2017 after I've done a lot of contributions for a while I've been doing a lot of Katie packaging security fixes and also sadly taking care of nodejs for a boil so I
have some goals with this talk it so introduced the node.js packaging ecosystem and teach you a little bit more about the module system how to the different packaging package managers and what I think to be the best one that has a lot of lot in common with Nix and I would like to inspire you to make your own line to nix tooling in pure Nix oh this looked different on my laptop
before anyway so a short introduction to what what a no GS module is so there are a bunch of different modules but I'm gonna be talk module types but I'm gonna be talking about it's the one way where you do require and without a path so one nice thing I think about nodejs in particular is how the require is just another function it's not like a language construct like in Python where you have import it's not very predictable you hear it's just a normal variable assignment it's a function returning returning a module so what happens when you do require module well no TS walks first checks in your local node modules path which everyone has seen before I guess it's not so well known that it also works recursively upwards in the directory tree until it finds another node modules and thirdly it falls back to the node path environment variable we can use this with some interesting effect which we will see later the different node package managers does different things in terms of flattening so flattening means flattening that the dependency graph into a single level of dependencies no just try to solve the issues of of deep deep nesting which doesn't work well on Windows by doing this and and also some other some other aspects which make these deep nesting not work very well the sad thing about this flattening is that it results in in impurities so if if you have it flattened dependency tree anything can require anything despite what is specified in the dependency specification which is why did they never went with this non flatten structure in NPM or yarn it's not entirely true but mostly true no js' was released back in 2009 back then it did not have a package manager and I was seen a bit of a a bit seen as a bit of a joke mmm NPM entered the scene in 2010 it has resulted in no js' becoming extremely popular I guess most of you had some some level has to deal with that it's pretty NPM it's pretty bad in almost every way imaginable it's slow as a dog it keeps redownload independencies at all times packages are a big flat flattened mess anything can require anything when you you don't have any any idea of what's actually gonna end up in your dependency when you require dependency without looking in advance they did not have integrity checking for the longest time so you had no way of making sure that what you thought you were gonna download was actually what you ended up with Santa Vanderburgh has made an amazing job on the note UNIX tool which I which is great it's extremely compatible with the with nodejs but sadly it only has cogeneration as an option which which i think is pretty sad considering that that you would now have to go through another step of generating the code before putting it into production
there then a few years ago John entered the scene got extremely popular for some reason tries to solve some of the performance issues of NPM by using aggressive caching it does less flattening but still quite a lot they added lock files from the start to try to reach reproducible builds in some some at some aspects deterministic dependencies resolution so unlike NPM which that where that changed every once in a while also flattens dependencies it has a pretty pretty great next tooling made by Morty and I think Simba has also worked quite a lot on that which can either do cogeneration or do runtime generation which means ingesting the the lock and and integrity files impure Nicks it was a great source of inspiration for me how things could be and how I think most know most tooling to Nicks should work we should not have to go through this massive cogeneration step it still flattens no module sadly so it's not ideal and it doesn't pretty complex things which leads to performance not being ideal pnpn is the new kid on the block actually it's not very new it was really it was released about the same time as yarn but sadly has not seen the same level of adoption it takes an inspiration from IE d which i think a lot of you have heard about which explicitly said we take a lot of inspiration from NYX in terms of purity and the main selling point that they advertise is performance it's incredibly fast if you try to install some dependency which doesn't pull in native stuff it's in the order of seconds has quite a few things in common with NYX has a centralized store which means packages are shared if you install the same version of a package in multiple projects it keeps reusing the same files on disk it does this by heart linking and in same linking in combination depending on which file systems he used operating systems etc so you can only require what you actually actively depend upon this breaks a bunch of note packages but it's not as common as you would think it's pretty common that they include all their dependencies despite none of the tooling really thinking about that case that's not entirely true because anything can always which I learned during this anything can always require whatever is depended on by the top-level application so that that was to resolve the issues of tooling like linters having plugins and so if you install a plugin into your top level the top level application that needs to be available despite the application like es lint not depending directly on it so the store structure of PNP M s is abusing this this fact that no js' is working recursively upwards the tree so what happens is that when something has symlink york's for example into its node modules nodejs walks upwards until it finds this what this node modules which includes yarks plus siblings - all of its dependencies which means that it's kind of short circuit at that point in the graph in not in your directory structure and not go looking for any other dependencies this means that we can get purity purity at runtime when we do require we have
something akin to overrides but they can only do graph rewriting so let's say a package called MS rest assure is missing a dependency on a package then we can rewrite the graph and include that dependency this happens more rarely than I thought it would so I do not even really crazy project have to use overrides much and the nice thing it is goes into the dependency lock file so I when I made my own next tooling I did not have to think about supporting graph rewriting or overrides because that was already implicit there was a lot of hard points about making this tooling one of the hardest is that the specifications in nodejs are all wrong they'll they say that you don't you can't combine the two attributes like bin and bins in your package JSON files but in in reality there are a lot of packages depending on this kind of behavior the the specific the specifications being wrong is pretty much a rule rather than the exception so you do have to look at how applications actually behave rather than reading specifications when it comes to no js' circular dependencies are way more common than I thought which before I started doing this project I had no clue that circle of dependencies were even a thing in notaries I would imagine that that would lead to infinite recursion but it does not they do have some some kind of fake fixed point stuff going on another thing is there are a large number of dependency types and a large number of invariance of those like at the NPM registry you can't assume that anyone is using the upstream NPM registry that is customizable we have locally linked files which injects symlinks to other local locally linked projects there are get dependencies there are a few other ones the nice thing about PMPM is that it resolves get github dependencies to their tarballs rather than using git directly so you do get a nice nice performance boost out of that PMPM has pretty rich metadata so all dependencies are pre resolved in the dependency lock file which is pretty nice so you can just have a look of that file and see exactly what what your directory structure is gonna look like on disk and what what thing is gonna end up in your variables when you require something that is not always consistent though when it comes to things like like peer dependencies which are which are meant for plugins so again let's say you depend on some linting linting feature that that code typically goes into peer dependents pair dependencies rather than dependencies because you can't mix and match linters you can't mix and match lagers etc and next makes these traversing these graphs and working on these kind of graphs problem really really easy it's been truly been a breeze working with Nix with these kind of problems there are some pretty ugly aspects of what I had to do to get this working in the first place like using import from derivation just to do the coding of llam?? files in Nix this is some somewhere I think we need to improve a lot in terms of being able to write plugins that that do decoding of unsupported file formats and it's I don't think it's a scalable approach to make everything we want to decode are built in so p.m. p.m. to nix it in itself is a quite a complicated Beast but using it is not I took a lot of inspiration from what what Simba had done on yarn to Nick's here so all you all you need to do to use it as it's what you see on the screen everything at all other necessary meta attributes are derived from from packages and in shrink-wrap PMO and if these two files are local so if you reference the source as a path rather than a fetch gate or something you don't even have to use package JSON and shrink-wrap llam?? one of the more recent features I added to this thing is the ability to use it for to manage your development dependencies so what what you see on the screen will result in is an environment where where you get the same version of of node you get wrapped node interpreter with all the all the dependencies exposed to you exactly as if they were installed with with P and PM I find find this to be pretty great because I hate using any node package tooling
I do I did have to create my own ad-hoc override mechanisms for you to be able to hook into the and override any package in the dependency graph it is pretty awkward but I I didn't find a better way to do this then then to provide an attribute set which takes a function of your derivation that that you can override if anyone is it has any better ideas that would be pretty great this package in particular was is a good example of why no js' is an insane ecosystem because the override bit down here is something I had to do because the package actually tries to download shared objects for your platform from the internet to be able to do image manipulation another big problem dealing
with nodejs packages is this PMPM to nix is not very fast because it does faithfully make a next package out of every single NPM package that you pull in and in some projects this can be pretty hairy I will show you a customer
project I've been working on where where the dependency graph is the most insane I've ever seen just a moment and if we zoom in on this we can see that this is actually and this is not an unusual thing I found these are about a about 850 packages that ended up in this dependency graph so the performance of
PMPM tunics is not always great because of the large number of packages and the amount of input from derivation that has to happen I'm hoping to improve upon this so the state of the tooling now is
mostly correct its thus mostly the right thing what it needs to do I have not found any new bugs in months it's mostly feature complete it does support almost every feature that PMPM supports in has a pretty comprehensive test suite where which is extracted from real world applications interfaces like the override interfaces are mostly stable that I do want to make a big change before I can say it's blundered oh and in the future I would like to abstract out some generic bits like the work I had to do to resolve circular dependencies where I have to walk the dependency graph identify cycles and merge the cycles into a single next package therefore from the point of Nixon no no recursion is going on at all closure size reductions as you could see that closure was pretty hairy all the native dependency tooling in in nodejs is built with Python and Python does gets pulled into the runtime of every single closure that pulls in any native dependency asked as it is now that is something I would quite like to avoid it's very big that horrible resolution that matching github reps to tarballs is not entirely pure because PMPM upstream currently does not have checksums for those and I would completely like to get rid of IFD if the big-ticket item there is getting its getting llamo decoding in natively and next somehow I would like to thank my previous employer for giving me a bunch of time to work on this at company time and for the customer which is using this in production and thanks to John tunics for being an inspiration of how I think all this tunics tooling should be the nodejs module system I think is actually pretty solid it's one of the nicer interpreted languages out there in terms of module system but the ecosystem is completely insane and near unusable I think if you do have to deal with it which you probably do at some point you use PM PM I I it's something that is truly appealing to me as an ex next person so any questions alright thank you thank you very much [Applause] and with you finish early we've got plenty of time for questions and people having questions with me a favor and keep your arm race why I see we have to go next how are you handling what pre-post installer script I'm shelling out to NPM in a bunch of spec you know in the pre-configure face you know in the post install in the post install face in index you can yeah so you're actually running the scripts the modules have with them yes because the antonakes for example takes the approach of just ignoring the scripts and then patching in some of them again for the cases where you actually need them because most of them are like electron trying to download boundaries or something like that and it could actually be easier if we try to always ignore them and for example now it's ass needs some foreign function interface building and patching stuff like that again that because the the far larger number of scripts ones that we actually normally don't want to run probably okay yeah I always always run the scripts faithfully yeah I basically have two questions the first question is about the graph you showed earlier in your presentation that you generate that from the runtime dependencies or the build time dependencies can you say that again the graph that you showed earlier in your that was the runtime dependencies okay because I think eight on their dependencies is not really that much I would say but I think if you would compare that to the to the build time dependency graph then I think you see you probably see a lot of packages that are involved so perhaps you should also try that I did it looks even more crazy and then my second question is basically yeah you also create a generator that basically produces necks expressions I'm actually maintaining two generators myself and there are many more what could you basically recommend yeah the other people that develop generators I think one of your recommendations is that you preferably don't want to run a generator right everything should work from next expressions yeah but I think the difficulty is you currently rely on doing things on the instantiation phase right you are you probably import generated Nick's expressions I think or import derivations no I thing I use input from derivation for is only doing llamo to JSON conversion so I can import the shrink-wrap llamo into next ok that's clear but are there any other things you would recommend that that you would like to see in the other generators mmm no I would like to see less generators and more more of these kind of pure next tools okay do plugins work yes nothing special to be done just works all supports yeah did I do wrap wrap a bunch of binaries we're setting the node part appropriately to get plugins to work properly to propagate them implicitly throughout the tree no questions so on the repository you say that pnpn doesn't include Czech stones for tar balls yeah that that was those get github tar balls yeah but do you think it can be included because yes I plan to look at that right next week no questions nope well in that case thank you very much for your interesting talk [Applause]