We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

What Python can learn from Haskell packaging

00:00

Formal Metadata

Title
What Python can learn from Haskell packaging
Title of Series
Part Number
109
Number of Parts
169
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Domen Kožar - What Python can learn from Haskell packaging Haskell community has made lots of small important improvements to packaging in 2015. What can Python community learn from it and how are we different? ----- Haskell community has been living in "Cabal hell" for decades, but Stack tool and Nix language have been a great game changer for Haskell in 2015. Python packaging has evolved since they very beginning of distutils in 1999. We'll take a look what Haskell community has been doing in their playground and what they've done better or worse. The talk is inspired by Peter Simons talk given at Nix conference: [Peter Simons: Inside of the Nixpkgs Haskell Infrastructure] Outline: - Cabal (packaging) interesting features overview - Cabal file specification overview - Interesting Cabal features not seen in Python packaging - Lack of features (introduction into next section) - Cabal hell - Quick overview of Haskell community frustration over Cabal tooling - Stack tool overview - What problem Stack solves - How Stack works - Comparing Stack to pip requirements - Using Nix language to automate packaging - how packaging is automated for Haskell - how it could be done for Python
11
52
79
Software testingRevision controlLibrary (computing)Computer programExecution unitModul <Datentyp>BuildingData typeLibrary (computing)Software testingComputer fileOpen sourceMetadataGoodness of fitDomain nameBitSoftwareContext awarenessDistribution (mathematics)Student's t-testMultiplication signDirectory serviceCodeProjective planeScripting languageComputing platformRight angleCASE <Informatik>Type theorySource codeGreatest elementPiVector spaceGenderNumberLecture/Conference
BuildingData typePauli exclusion principleSoftware testingRevision controlAbelian categoryFlagLibrary (computing)Default (computer science)Extension (kinesiology)Process (computing)Set (mathematics)FlickrIntegrated development environmentSpyware1 (number)Goodness of fitRun time (program lifecycle phase)Computer fileWebsiteCASE <Informatik>InformationNP-hardFlagPower (physics)Library (computing)Type theoryCodeSubsetLatent heatComputer configurationComputer programmingBuildingCondition numberBitProtein foldingLine (geometry)File formatFunctional (mathematics)Right anglePauli exclusion principleDefault (computer science)Front and back endsLecture/Conference
Pauli exclusion principleComputer fileLine (geometry)Revision controlClient (computing)Source codeTerm (mathematics)Stack (abstract data type)Revision controlHecke operatorType theoryIdentity managementDefault (computer science)Right anglePoint (geometry)CompilerError messageFunctional programmingHash functionMathematicsModulare ProgrammierungState of matterDistribution (mathematics)Social classArmSource codeSoftwareElectronic signatureAreaStability theoryWebsiteProcess (computing)Rule of inferenceForcing (mathematics)CASE <Informatik>Data managementLimit (category theory)Hacker (term)MereologyMultiplication signNetwork topologyMessage passingMachine visionSet (mathematics)Subject indexingInformationWindowSoftware developerConstraint (mathematics)Computer fileInformation securityLogic gateIntegrated development environmentSystem callSoftware maintenanceLine (geometry)Pauli exclusion principleSoftware testingStack (abstract data type)Imperative programmingState transition systemLecture/Conference
Computer-assisted translationFlagInterior (topology)Software testingStack (abstract data type)Computer fileIntegrated development environmentConfiguration spaceCASE <Informatik>Imperative programmingSet (mathematics)Data conversionCompilerDirectory serviceState transition systemProjective planeFluid staticsMultiplicationWrapper (data mining)Visualization (computer graphics)Connectivity (graph theory)Software developerVirtual realityRevision controlFlagStack (abstract data type)Greatest elementMereologySoftwareResolvent formalismField (computer science)Installation artLecture/Conference
Computer-assisted translationFlagInterior (topology)Software testingStack (abstract data type)Configuration spaceInheritance (object-oriented programming)ArmResultantResolvent formalismComputer fileWordFitness functionRight angleCompilerProjective planePower (physics)Perspective (visual)Integrated development environmentNumberState of matterScripting languageRepository (publishing)Fluid staticsHacker (term)Set (mathematics)DialectVideo gameHash functionNetwork topologyFormal languageData managementExpressionLatent heatCellular automatonIntegerInheritance (object-oriented programming)Revision controlConfiguration spaceSystem callFunctional (mathematics)Different (Kate Ryan album)Physical systemBinary codePatch (Unix)MiniDiscSpacetimeFunctional programmingElectronic mailing listDefault (computer science)InjektivitätSoftware testingWritingMechanism designMultiplication signPoint (geometry)Data dictionaryExpert systemSummierbarkeitProcess (computing)Local ringGodSoftwareDistribution (mathematics)View (database)Analytic continuationGroup actionState transition systemParsingLocally convex topological vector spaceBoom (sailing)1 (number)Limit of a functionComputing platformHypothesisSlide ruleFlagElectric generatorInternet service providerFunction (mathematics)Attribute grammarRecursionLecture/ConferenceProgram flowchart
Inheritance (object-oriented programming)Extension (kinesiology)Reverse engineeringCellular automatonComputer fileScripting languageFunction (mathematics)Subject indexingFunctional (mathematics)BitPower (physics)outputAuthorizationArithmetic progressionProjective planeOrder (biology)Physical systemHookingMetadataBuildingConfiguration spaceRecursionPoint (geometry)Set (mathematics)Inheritance (object-oriented programming)CompilerDifferent (Kate Ryan album)Greatest elementExtension (kinesiology)InformationProcess (computing)TextsystemDigital photographyDisk read-and-write headRevision controlSystem callMultiplication signForestReverse engineeringData structureCASE <Informatik>HypermediaOntologyUltimatum gameParameter (computer programming)MereologyArmComputer animation
Transcript: English(auto-generated)
Okay, good morning. Thank you for coming progressive to engineer domain closer Welcome to Europe item, I'm really excited to be here for yet another year Just just before I start my talk I'd like to say a little bit about myself
So you you'll better understand the context of it I've been interested in the software distribution since basically I was a student I was I was using gentle back at times Developing for Google Summer of Code a project to to package Python
automatically For the gentle platform and so on and in the last three years. I've been working on Nick's OS It's a linear distribution probably heard of it and I'm tackling the problem of how to distribute all those packages to people and make it easy to use and it turns out
It's it's not so so I'll talk about how Haskell does it and how that compares to Python and and what we can learn and what things we already know but we Just can't get there because it's it's it's complicated because of our legacy
So And currently I'm working for a company called snob and we're doing open source networking software and I'm Infrastructure engineers. I'm setting up the whole the whole pipeline for testing and benchmarking them
so so So my PI right we got types in Python so clearly that we are improving Python Even though it's more than 25 years old and and Haskell is definitely inspiration here So so there clearly there are things to improve and to learn upon
So so let's let's start how Haskell does packaging And and their their their tool is called cabal and you would have a file like this It's a special kind of syntax And at the top you'll see just some metadata
About the package and at the bottom you'll see you can say okay my my thing my software is a library But there is also an executable And it has this dependencies It lies in this source directory and so on So so one thing that you will figure out that this compared to Python
This is just a file that you can parse and in Python We have this script you have to run for actually to do something and I'll I'll dive into that a bit later Why why that's a big difference? And how that you know affects everyone pretty much So if you think about the API in in this case in Haskell, you would parse this and get the metadata back
In in Python the API is setup function, which does everything like literally everything so so The the format is more approachable and we'll see that a bit later
So so one thing if you were to care from of you, you notice this builds type line in that file and and if it's if it's Specified the simple that means you parse that file and you have all the information you need to to to install that package in Haskell
but also That you can say build type make or build type custom and in case of make it will run the make files And it will skip the the Haskell building process and in case of custom It will run a Haskell program with specific hooks Where you can specify code?
So you have the power to go from very simple to to overriding Unfortunately, the custom is not at used The custom method is not used because it's fairly poorly documented but that's also a good thing because then people fall back to simple
So so in Python we have pep 518 Which is I think it's it's not accepted yet, but it talks about Basically how to hijack setup tools build process and and you can define your build process And this is in progress and you'll be able to you'll be able to
To go and not even touch the setup tools machinery and and do whatever you want You'll have the freedom to to for example, write a makefile backend for Python packaging And of course, this will be integrated into the pip and so on. I know the tools
which is which is really nice because finally we'll we'll be able to to go forward from from The the legacy that we've been stuck So just a little bit about advanced Features in the cabal for example here you in in Haskell you can say, okay
I want to have this flag that you can toggle and For example, if we have a flag debug you can we can describe it provided fault and then throughout the file We can write conditionals like, you know, if this flag is enabled then do this Then this option is is configured and so on so it's like a very simple language within with just
if sentences and nothing And this way this gives you the flexibility of of saying For example, if you have a library do we want HTTPS support or not?
and And but there are downsides also in Haskell For example at runtime once the package is compiled there is no way to know which flags were used. So You just don't know that and and also for example You can say if HTTPS flag is enabled then at these dependencies
But it also works the other way around if for some reason those dependencies are in the environment that flake will be enabled by default So there there is some magic and and they also they also have problem and one thing you learn in packaging is that Features are really problematic. Once you start introducing them
You have to support them and and these kind of things are really really painful on the long run And and In Python we have the PEP 508 which which is Environment markers. So for example, you have a dependency you can say, okay
Dependency is on only on Python 3 and Windows for example, and so on. This is already Supported in PIP, but Not many people are using this because they know don't know about it and the idea is that you don't write in Python if Imperative coding, you know, if we're on Windows blah blah
You just say okay This is the pendency and the marker is Windows and you're done and this gives everyone else the possibility To also get this information to parse this marker and and to to do something with that information And I'll talk about later. What? What we were doing with that
So so hack it is is the the Haskell Python packaging index you publish your packages there And people can download them but just just As an example of a feature Where you know, it's really painful to support on the long run in Hackett
You can edit the cabal files in place through the website. So that means if you release version 0.1, for example Somebody can go and and edit that cabal file and remove a dependency and then it's not really 0.1 anymore
It's it's a whole new thing It's slightly modified, but it is still not the same thing so so in that case the package will will add this revision to a line to cabal file and You know when you start to think about okay now I have this local Process where I release software and then I can also edit it online
But then what happens if I bump this revision and push it to the hack edge and so on so there is a lot of stateful things going on suddenly And while you know, this might be a good idea and maybe some you know 1% of people want it for everyone else using the hack edge to download packages and to figure out the state
This is really really problematic Especially if you want to have reproducible builds once you edit that file the hash changes of your tarball So all the people that say okay download this file and this is the hash They will suddenly get a mismatch and we really don't want to enforce a culture where well you just say, okay
It's a new hash whatever because then there is really no point, right? So these kind of features are also present in Haskell and they're also present in Python Which they give us headaches every day
And and the API then for for In the hack it is that you can say revision and then to dot cabal and you can get this revisions, but it's a You basically have two versions first first you have a version and then a revision and it's just Becomes a nightmare handling those
So, you know Haskell is is one year older than than Python and and they've also had this path of improving the packaging ecosystem and since about until two years ago, they had this problem where
Well in your cabal file you had to specify the dependencies and we all know that you know Not all software packages work well together and in case of Haskell because there are types You would get a new package the types which change and suddenly your your usage of that package would would not You know would wouldn't work
So Haskell wouldn't compile and and this is the the most the biggest problem they had is is called cabal hell So then you would start, you know, when when a package would get a new version Things wouldn't compile you would start, you know putting in these constraints and so on So every developer would do this for for himself or herself and and it's just a big waste of time
Trying to figure out which packages really compile so I'll talk about how Haskell solved this but just An Interesting thought how elm which is another functional language solved it
They basically said in your dependencies. You have to say always specify the limits of the major version so if you say I depend on package HTTP, it has to be between version 5 and 6 and Then if you uploaded That Package an API changed it wouldn't allow you to upload it unless you bump the major version
So it's basically forcing the semantic version at the package So the package manager forces you not to change the types the the signature unless you bump the major version And and that's really nice We cannot do that in Python unfortunately because there is no way to really detect if an API changed
Well, of course that we could parse the the API's and so on but that's that's the gray area They're not something not something hopefully something we will be able to do one day So so, okay. So so just call Haskell solve that
So they solved it actually it was released in 2005 so just one year ago Called stack edge. So stack edge is a stable source of Haskell packages Guarantee packages built consistently and passed as before generating nightly and long-term support. So what does that mean? So so they built a site where you as a maintainer can log in you
specify some of your information and you say okay, I'm a maintainer of these packages on package and and then they go and they They pick a dependency tree Of your package and build it and see if all the tests and everything passes and then they say okay
We use these versions and these versions compiled And then they provide an API for that so you can not you can get those versions So so if you think about it in in Python we have Requirements dot txt, but everyone has their own set of versions in Haskell
They pretty much crowdsource that so they have an a website where all the those versions are Tested and compiled and and people use that as a community effort Not as something you commit to your repository and and you hope for the best and and and so so if you want if you want for example to have
Backwards compatibility you depend on stack edge LTS six And then all the minor versions Six that seven six that eight guarantee you that the API didn't change, but they still ship security updates and so on And when you're ready, usually the new version means a new a GHC, which is their compiler the main compiler
Then you're ready to go and and fix those errors compiler errors and you go to the next version So I think that's very interesting Because you you're there. They're doing all the work together in one place instead of everyone in their own garden and
I'm Not sure really if we could do something like this in Python because it's way more complicated than just compiling the package and saying It works But I still think it would be worth the effort of at least having the major software that we use in Python
To have these versions community managed instead of well Having this work done by each individual or company So yeah, our solution is requirements to text
So together with with stack edge. They also released a tool called stack which is like a wrapper around cabal So it can do more things than than just cabal And you specify a configuration file like this and you say okay. I'm gonna use these flags
That you'll pass to cabal when I'll be compiling software I'm gonna use these packages So so you say okay The package is in a current directory and there's the cabal file and that's the one we'll use to build this project And you can have multiple of those
So if you think about how Python does that you have to say tape install minus e dot or something like that So that's imperative you have to actually like run that and when you have a new if you develop on two packages You have to run for both of them. And in this case, it's declarative you open that file. Sorry You open that file and you know what packages are being added
It's there is no imperative steps instead of them saying just stack build and that will execute the whole thing So it's it's way more declarative and and and at the bottom you see the resolver This is where you get this big set of pinned version And you say LTS 6.7 and there you go
You have most of the Hackett's packages pinned down and you're sure that those work And there is also a field called extra dependencies and those those are the dependencies that are not in the LTS So not everything is pinned down. It's a community effort. So of course if people don't do it, then it's not there
So for all the the packages you have that you don't That are not part of the LTS you can specify them there and stack will complain if you don't do that So it has a bunch of simple commands Like stack setup is something like virtual environment for us. It will download a compiler and
It will set it up for you based on The resolver that you're using and so on and stack init will generate the files It's like a mini templating for starting Haskell packages and so on so so that's
That's what stack does and and the community was really really happy when when this happened a lot of problems went away Right So so now so now that we have
This hack edge with all of packages and a stack edge as a set of files a set of versions Then then, you know my my my job and what I'm doing is okay How do we distribute all the software to the users so that they can really?
Get this seamlessly and it works for for the you know, whatever the platform and and We're doing This with Nix. It's a functional language it's based on based on the PhD thesis by Elko Dostra and it's It's it's a very short and nice thesis, I recommend it to you to anyone who is who cares about packaging
And how how the functional language concept can change the the thinking? Dramatically and you know improves a lot of things that we have problem with today So so this is for Haskell. This is kind of the stack that we have
Nix packages is then a collection of Nix expressions that specify how Some software should be built similar to apt or something else in the distributions except that we're not tied to a Linux distribution and we support Darwin and and Linux
So So why you would need this layer on top of the upstream has a package or a pipe? Yeah is because we can take care of system dependencies We Have a build system that will compile these packages and provide binaries for you And we have a really powerful API, which you'll see later
so that you can actually go there and change those packages and and and you know Tweak them in a way that you want apply some patches Boom versions or whatever you want to do So you're not So we're not the upstream that you just have to say Okay, either you use what we have or it's nothing but you have the power of of changing that and and most
Importantly in in and Nix packages. We have all the Haskell packages there We don't compile all of them we don't Because the there is that's a lot of you know power and and disk space that you need
So we take only one GHC version and for that compiler, which is the latest stable one We compile all the packages or most of them But theoretically we could we could distribute all the binaries and so on So the user can then say okay. I
Have this project. I have these packages. I want binaries and it will you know The Linux package manager will download that and and there you go You have you have you didn't compile anything except your package and and that's really nice Especially because you can share it between Darwin and Linux Okay. So how does how does that work in Haskell? How do we
How do we get that done? And and why why is it so hard for Python to to to to accomplish this? So this is the the simple infrastructure That we have so let me explain what's really going on here So in the left upper corner, you see packets. That's the API that has all the packages
And then there is a script that goes and downloads all of them Calculates the the shots and everything and commits that in a repository So you have a git repository that's called all cabal hashes and you have all cabal files there So you can go through all of them and parse them and generate dependency trees and so on whatever you want to do
and then you and then those hashes those cabal files are taken and they're built into stackage nicely and that's that's Gives you a view of what builds currently and not and that's a continuous process, of course And then based on the stack edge nightly when things kind of low. Okay, they make this LTS Haskell
Which you've seen before and that's kind of like, okay this kind of compiles all together now Let's let's take those versions. So this is like the stack edge and the upstream that Haskell provides So then we have hack edge dot nix that parses
The all cabal hashes repository and the stack edge repository and generates Haskell packages dot nix and it generates configuration LTS dot nix So in Haskell packages of nix, there is every version of every package Specified how you should build it and this is all generated from the cabal files
It's one-to-one mapping some features in cabal. We don't support some features We do there is room for improvement, but in general, it's it works And the configuration LTS basically just says okay based on the LTS version and and the long list of version
Dependencies pick those versions to be the default ones when you when you use these Haskell packages So there it's just basically pinning in nix Okay, take these versions because in hack its packages It will always use the latest version which is as I've said before not always means that things will work
Alright, so then there are two more files configuration dot dash comma and configuration GAC X epsilon So those are the the files that have to be manually That have to be manually crafted and and maintained and in there if the cabal file For example doesn't have specified system dependencies in there
We will override and say okay for packets HTTP, you know also take this system dependency and so on So basically everything that's not in upstream cabal file We will override there and in in configuration GHC. We will do that But but on based on the GAC version
So some GAC versions might need different flags or or disabled tests because they don't work and so on So those are the two files that we maintain and everything else is upstream provided by the Haskell community And then you have this cabal dot nix in the middle and this is what the user gets So when you have your project you have your cabal file you say cabal dot nix you run it and it will generate a nix
expression automatically out of it Specifying all the dependencies and in in there you can say I want a specific LTS version or I want the latest packages or whatever So this is this is all as a user. You just run cabal dot nix file and you get Basically the whole set of of dependencies that you know that are going to work and and the cabal call nix file has this
Function called package overwrites where you can basically overwrite anything from the upstream You can say take this package but different version take this package But apply this patch or whatever you want and then you indeed then you install this
Software and we have binary distributed Haskell pipeline Alright Hope I hope that's that that was not too fast and it's clear enough All right, so so this is the probably the hardest slide
But I would really like to say a few words about the infrastructure in X and how this files all work together And it all fits on one slide It's just not that easy to explain. So let's so basically what we want to do is some kind of inheritance We have different files and we want those files to override each other, right? We want this powerful overriding
Mechanism. So at the top you see a function called fix and that's a fixed point That's how you do recursion in in in functional language And and it's basically calling itself It's a recursive function that just calls itself and how it works is it takes the output and it feeds it into the input
And because the language is lazy it will do that only until you reference something so as for example At in the middle, I define something you would call a dictionary It's called an attribute set in X, but it's pretty much the same and you can say, okay
I have an attribute few fool. That is the value foo and bar with bar But the foo bar is actually solve that foo and solve that bar, but that self partner that self is really just the Input of this function. It's a lambda function. It gets self as a parameter, but that self is actually the output of itself
So it will actually then Reference self that foo and self will be the same actual thing and it will reference the foo and get it back. So it's it's just recursion and a function nothing really fancy and when you when you call fix fix point on this
On This function on this dictionary and you asks is the foobar you will get the value foobar back and it will just basically call it twice And this is a way how we how we do dependency and how we how you can reference different things
Okay, so so now now that we have that We Want to have a little bit more of flexibility and we define a function called extent I won't go into how it works how it's defined but
If you look at the override, that's the function That's the API you get and and this override function accepts two things Self and super and self is the input and super is the output of this dictionary So you have the power to Get the previous configuration file and either references inputs or outputs. So you have both things
So in this case, I say, okay Take the foo take the output a super dot foo and reverse that So if I call then fix extend D and the override so that means Extend the the D dictionary and override it with this function
you will see that foobar value is Different because we have reversed the foo and that gives us the power to override The dictionary at the top either by inputs or outputs And you if you if you call it twice, oh, it's not seen here, but you will get you will get foobar back
So so that gives you did gives that all the power to override its files. So, okay How do we use that? This is then All that you need to combine all these files You say first I have a fixed point which takes care of the recursion and then I take all the Haskell packages the common
Configuration file you've seen before the compiler specific config the packet set config and then at the bottom all of the overrides Where you can hook into and and you can change everything from the upstream how it's built
So In Python currently nix we manually edit files why Because of this problem We have a set up high script and you have to run all of those scripts to actually get and figure out what's going on
So someone would need to take that and for everything in Python package and index Generate some JSON file or something with all this meta information that we could then use to generate and automate all of this And we would need to maintain the requirements file global for the full Python package and index So these are the two big big projects
That one would need to tackle in order to have the same infrastructure and and then we would be able to build all the the whole Python package in index basically and distributed to two people and Well the first one
The first problem is kind of being solved And in communities is trying to get there, but we still don't have a way to do it today But the infrastructure is improving we got wheels. We're getting a new Python packaging index called warehouse Which is going to be tested and easily changeable and so on so everything around is changing
But this is still not doable today and with the build system hook that I talked before We'll be able to have different tools than just set up tools to build Python packages and hopefully one day We'll have a standard one that will be statically based instead of a script that you have to run
And as for the second problem, I don't know currently if anyone is solving that Crowdsourcing the versions, but it's definitely something that we'll have to solve it ourselves or someone will have to do it for us so
so Python is actually doing quite good in a sense that it has all of these things are being worked on and so on But one thing that's really missing is if you think about it that it's still not declarative enough We have so many files that you have to touch You have to touch the setup pi set of the G requirements manifest now the pi project omo is coming
Talk is that any and and it's just a lot of different things you have to set And and in Haskell, there's just two files the cabal and the stack And it's really hard to get rid of these because this is our legacy But it's a lot of information people have to know to actually to use it
And and this is improving but it's still an ongoing process All right. So this this talk was based on the Peter Simmons inside an expected is Haskell infrastructure If you want to see that talk it goes in a little bit into the details
how it all works and And I hope that I hope that you've seen What are the current limitations and and at the same time? I would I would still like to thank the Python packaging authority and everyone who's working on improving the ecosystem It's it's really hard to have 25 years of legacy and and just replace all of this and say, okay
You know, we have these new shiny thing. It's gonna work out And and it's it's going slow, but there's progress so, thank you
So we have time for questions, right, thank you very much that one someone I want to ask a question now any questions
Okay. Thank you for coming