Incremental package builds - TIB AV-Portal

Incremental package builds

00:00

2

Maudoux, Guillaume (layus)

Formal Metadata

Title

Incremental package builds

Title of Series

Number of Parts

14

Author

Maudoux, Guillaume (layus)

License

CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/39622 (DOI)

Publisher

Release Date

Language

Production Year

2017

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

The current CI infrastructure around nixpkgs has difficulties to build the whole package set in due time. While there were attempts to build every pull requests, these are not enabled by default because it requires even more resources. We will explore how to optimize even more the build process to meet the size of nixpkgs. We will survey with state-of-the-art build systems like scons, bazel and tup to see how we could improve incremental builds in nix. Based on nix-make, we will describe how nix could cooperate with the build tool of a given package to bring incremental builds at a finer granularity. In particular, we will explore how the intensional store model (as used by bazel for example) could help avoiding some mass rebuilds.

NixCon 20178 / 14

Automatic playback

Speech

Text

Image

00:00

Event horizonNear-ringBuildingFerry CorstenMeeting/Interview

00:37

Event horizonSpeech synthesisEvent horizonWordDifferent (Kate Ryan album)Right anglePresentation of a groupComputer animation

01:17

FeedbackPresentation of a groupSystem programmingProduct (business)Configuration spaceBuildingPatch (Unix)Presentation of a groupPlanningData managementPhysical systemCASE <Informatik>Scripting languageMathematicsWindowMultiplication signPhase transitionRight angleDegree (graph theory)Data storage deviceComputer fileReal numberVideo gameBuildingFeedbackComputer animation

03:20

Phase transitionDerivation (linguistics)Wrapper (data mining)BuildingSystem programmingoutputTimestampCache (computing)Computer networkCodeLink (knot theory)Computer virusObject (grammar)SoftwarePoint (geometry)Client (computing)Right angleMultiplication signCompilerResultantMereologyInteractive televisionComputer fileoutputTimestampLink (knot theory)Physical systemWrapper (data mining)Derivation (linguistics)BuildingPhase transitionSystem callProjective planeLevel (video gaming)Data storage deviceCache (computing)Virtual machineCompilation albumBinary codeNetwork topologyNeuroinformatikFeasibility studySoftwareCodeGUI widgetSpeech synthesisClique-widthMixed realityMoment (mathematics)Computer animation

08:52

Physical systemCache (computing)System programmingBuildingControl flowBuildingArithmetic progressionInterface (computing)Online helpSystem callDirection (geometry)Data storage devicePhysical systemCache (computing)Network socketIntegrated development environmentDigital electronicsGame controllerMedical imagingCuboidMultiplication signRight angleMixed realityComputer animation

11:25

Cache (computing)Patch (Unix)Virtual machineConfiguration spaceFlagGastropod shellRevision controlOverlay-NetzLocal ringComputer configurationService (economics)Integrated development environmentUsabilityBuildingSource codeWindowPoint (geometry)Projective planeData managementLink (knot theory)Derivation (linguistics)Data storage deviceInteractive televisionMultiplication signSoftware testingNormal (geometry)Loop (music)Revision controlMereologyComputer configurationRandomizationIntegrated development environmentMoving averageDirectory serviceBuildingLaptopSoftware bugCorrespondence (mathematics)Model theoryoutputVideoconferencingMultilaterationInterface (computing)BitService (economics)Different (Kate Ryan album)CASE <Informatik>Moment (mathematics)Configuration spaceServer (computing)Overlay-NetzResultantComputer fileBranch (computer science)Ferry CorstenPatch (Unix)Gastropod shellRight angleNetwork topologyHoaxSimulationProcess (computing)Table (information)Rollback (data management)WordProgrammschleifeStatement (computer science)Compilation albumCache (computing)CodeLevel (video gaming)Function (mathematics)Quantum stateParameter (computer programming)Computer animation

19:42

Drop (liquid)Revision controlGastropod shellPatch (Unix)TrailCodierung <Programmierung>Virtual memoryInformation securityExecution unitBinary fileCache (computing)Computer networkTotal S.A.DreizehnMiniDiscDerivation (linguistics)Uniform resource locatorData storage deviceRight angleException handlingSlide ruleDampingIntegrated development environmentSpacetimeMereologyBuildingOperator (mathematics)outputSoftware bugInterface (computing)Revision controlSoftwareMathematical optimizationDirectory serviceComputer fileConfiguration spaceSource codeBitMultiplication signDemosceneFlagFormal languageHash functionMathematicsDifferent (Kate Ryan album)Presentation of a groupNetwork topologyHierarchyWord1 (number)Gastropod shellMusical ensembleLink (knot theory)Physical systemCorrespondence (mathematics)Cache (computing)Projective planeNP-hardVirtual machineDrop (liquid)Lecture/ConferenceMeeting/Interview

28:14

Computer networkSpacetimeBand matrixBinary fileSubstitute goodCache (computing)Derivation (linguistics)Function (mathematics)Hash functionData storage deviceContent (media)Real numberImplementationAddress spaceProjective planeDerivation (linguistics)Binary codeRevision controlAlpha (investment)Computer configurationoutputDifferent (Kate Ryan album)Control flowData storage deviceSystem callContent (media)CASE <Informatik>Software testingIntegrated development environmentArithmetic meanLink (knot theory)Multiplication signMusical ensembleMathematicsBytecodeSoftwareVirtual machineSpacetimeSet (mathematics)MiniDiscDistribution (mathematics)Differenz <Mathematik>Uniform resource locatorBranch (computer science)Network topologyState diagramMathematical analysisBand matrixPoint (geometry)Moore's lawGraphics tabletSubstitute goodScripting languageRight angleStandard deviationMetropolitan area networkBuildingHash functionFunction (mathematics)Cache (computing)NeuroinformatikComputer animationEngineering drawing

36:45

Real numberContent (media)Data storage deviceDefault (computer science)Revision controlView (database)Code refactoringSturm's theoremCode refactoringPoint (geometry)Integrated development environmentUniform resource locatorMathematicsAlgebraic closureLibrary (computing)MereologyInformationComputer animation

38:17

Content (media)Cache (computing)Data storage deviceMultiplication signDivision (mathematics)Operator (mathematics)BitPhysical systemMusical ensembleCommunications protocolMathematicsInteractive televisionComplex (psychology)Content (media)Right angleBuildingGastropod shellHash functionComputer animation

Transcript: English(auto-generated)

00:03

Next speaker is Guillaume Modoux. You also know him as Laius on GitHub. He started using Next at Mozilla a few years ago. Now he's working on a PhD on incremental builds. Yes, thank you. Give it up to Guillaume.

00:26

Okay, so let's start with a small background on me. So I do my PhD in Louvain-Laneuve near Brussels. And just for the story, Louvain-Laneuve has a huge biking event. Which is also the second biggest drinking event in Europe.

00:45

After the one in Munich. So you know everything. About my username. Okay, it's like a pun with a French word. But yeah, it's not very good as an introduction to this presentation.

01:01

So please mind there is one little difference and that's very important. Right? I contribute to Next packages, of course. And my GitHub picture is a stone like that. This stone is a stone from that castle which I also contributed to build.

01:21

So now you already know that I like building stuffs. So I started my PhD on incremental builds. And I want your feedback on this presentation. So this is very important to me. Do not hesitate to interrupt the presentation if you want to ask a question in the meantime. So let's start the presentation.

01:42

Incremental package builds. What is incremental? I went to the dictionary which is always a good way to start. Except that in this case it doesn't mean anything. Incremental is something that occurs in increments. Okay, come on.

02:00

So what is increment? It's the amount of degree by which something changes. Okay, we cannot do anything with this definition so I just came with mine. An incremental build system is something that works with very small steps. We can do each of them separately and work with them.

02:21

We can use all the build products. And we use them so we do not have to do things that are already done. And we also of course need the ability to detect what needs to be done. So this is the plan of this presentation. I will take three examples from real life.

02:42

One with Firefox, one with i3, the window manager, and one with the store. The Firefox example shows why it's important to have small steps. If you ever try to build Firefox, then the build phase takes one hour.

03:00

And everything else is basically negligible compared to the huge build time. I was playing with that package and I had to change the fixup phase. And of course I made a typo in the bash script. And then, whoop, everything was lost. Okay, so we really need checkpoints.

03:20

I mean, it's just like when you are doing rock climbing. You are climbing, okay, and then at some point, there is a problem. And then if there is no checkpoint, then bah, you are back to point zero, right? So we really need that kind of stuff. Just not to lose everything all the time. So how can we do that?

03:41

A very simple idea is just, okay, we already have parts in every build. We have different phases. So we could split them in different packages so that one phase is finished. I mean, it's done. We can work with the next one.

04:01

Yeah, but you know, it doesn't always work. For Firefox, it's good, but if the failure occurs in the build phase, then basically you have still the same problem, that you lose one hour of build phase. It's not good for the store because you will have every phase in the store. And what does it mean? Can that even work? But this idea has already been implemented in some places with wrappers.

04:24

Firefox, for example, has an external wrapper, which is another package that just writes the wrapper. And that way, you can change the wrapper easily without changing Firefox all the time.

04:40

Okay, so incremental builds are about small steps. And in fact, Nix is incremental in some way. It's just that it's incremental at the level of package, okay? And the small step for Nix is a package. On the other hand, we have build systems. They already manage incremental builds of package,

05:03

like Firefox has its own build system, and it's also incremental, but we are not able to use that now in Nix because we want things to be very pure, so there is no interaction with the external world. So the idea here is to use existing build systems and to make them interoperate with Nix,

05:24

so that we can achieve incremental builds within Nix. The first build system that everyone knows is Make. And, well, I will not speak too much about Make because it works, but there is nothing you can do that's correct or in the sense that we like it.

05:42

Make will never be pure. That's not in the design. Basically, you use timestamps, and we don't run timestamps in the store, for example. It requires to have the previous builds in the same tree or as a file in the tree, so you need to add that as an input to the package, and everything gets ugly.

06:02

But there are other tools that are quite interesting, like Ccache. Who knows Ccache? Not that much people. So Ccache memorizes compiler's invocation, so you can wrap a compiler call with that, and if the same compiler call has been made before, it will fetch the result from the cache.

06:24

Even better, Mozilla developed the shared Ccache, where this cache of previously built stuff is shared on the network, so you can reuse a build that someone else did. And, well, this is a very good and robust design,

06:43

because, I mean, it's the design behind Nix, right? There is a shared cache, a binary cache that you can use, and that you are sharing, and this is used to replace building of some packages if they were already built before.

07:00

So caching is a very good idea that we will reuse later, but I want to show a small project by Elko, which is NixMake. And NixMake basically allows you to... It's a build system. At the moment, it only works with very simple stuffs, like that. You see that it's 15 commits, okay, so it's not very active,

07:23

but you can still do things like that. There is one derivation, compileC produces the derivation, and that derivation just generates a .o file, and then there is the link derivation that just takes all the .o file and produces the binary.

07:41

So we can use Nix to do incremental compilation at a finer level than just packages, but, okay, you can do very small steps. It's compatible with Nix, of course, but, I mean, it's even worse than before. Every single .o file ends up in the store.

08:02

And you also need to port every project to NixMake. Okay, so if you're using whatever before, like Bazel, then you need to write another layer to port it to NixMake. This is not feasible. Speaking about Bazel, this is an amazing, a tremendous project from Google.

08:20

Looks like they are very proud of this stuff. Fast or correct, pick two. And it's a project that has most of things right. They manage to build every command. They can basically cache every command and put that in a shared cache.

08:42

They can even ask another machine to compile that step and then get the result. So this is something that we would like to have. It's also really, really correct. There is sandboxing going in there. But, you know, it's so good that it does exactly the same stuff as Nix.

09:04

So when you try to make both of them work together, they're fighting to take the control, be the one that manages sandbox and do not let the others do other stuff. So it's quite difficult to use it. In fact, it's used at Google, and at Google, everything uses Bazel.

09:21

So it's quite easy. But with Nix and Nix packages, we have something that's much more heterogeneous. I mean, there are packages with Make, packages with SBT, packages with CMake, and we cannot use just one build system to rule them all. So is it possible to do something intermediate? And my solution is, OK, we want caching.

09:42

We want to be able to cache smaller steps than just packages. Can we somehow allow the build system to access an external cache while remaining correct? And to do that, we may need the help of Nix Build to provide the same interface.

10:03

But in the first attempt, if you do not care too much about correctness, then it's very easy to implement. Basically, you build not so pure, allow access to an external store, and allow, for example, Ccache to access that store. It looks like that. You call Make.

10:20

I mean, Nix Build starts a sandbox. Makes works in there. Call Ccache, for example. And then Ccache grabs in the environment a socket to discuss with the external store. As simple as that. If you trust Ccache to do things correctly, then you will have correct builds. Of course, you cannot always trust Ccache,

10:40

so we will have to implement some checking on top of it. But this is the basic idea. And if you start with something simple like that, then you can port one package at a time, try to make it work, start maybe with big packages like Firefox. And yes, this is my master plan. This is something that I want to do for my PhD,

11:01

the ability to use incremental builds from the build system with Nix and to be able to go both directions to make them cooperate. So at the end, we should be able to rule over the world with this system. But still a work in progress.

11:25

Okay. Any questions about that, maybe, before the next part? I mean, you can still ask questions afterwards. Yes?

12:00

Okay, I can say two things. Yes, the intentional store would help. I will speak about that later. And what can I say? Bazel already does something like that. Much more advanced than what Ccache does, except that it does a lot of other stuff that we do not want.

12:23

So we may need to import that part of the code, and then if it works for them, it should work for us too, right? If you want more correctness than what they achieved, which is already quite high, then we may still work on it, implement it. The interface here may not be just raw access to files.

12:45

It may be, okay, I want to build to call that command with these files, with these environments, and then we let Nix build, run the commands, and produce a cache result. So if everything is in the hands of Nix build, then it should be clean, okay?

13:01

But this is more strict, and this needs more, I mean, it needs to change a bit how Ccache works at the moment. So the more correctness you want, the more work you have to do, of course.

13:33

Okay, so the idea about this stuff is that this is not defined. This may be the next store, this may not be.

13:41

It depends on what you want to do. This is not the first time that I use something like that. There are a lot of you that want to have hierarchical Nix stores. When you have one Nix store for the packages, then one Nix store for other stuff that are less important, that can be garbage collected more often, and maybe another Nix store for custom projects.

14:02

I'm not sure that this idea of Nix store is the perfect way of doing caching. The idea of Nix store is to have, I mean, you need a directory because you need to access the files, but it may be more efficient to cache compressed versions of the files, for example. So you may not want exactly a Nix store,

14:21

but you need some way to cache stuff. Okay, let's continue anyway, you can ask questions later. So another use case I had, I had a problem with i3, which is my window manager, and being the window manager is something that you need all the time.

14:40

But to reproduce the bug, I had to use it on my main laptop because basically the bug happened when I undocked it and docked it again. So, I mean, this was the only way I found to reproduce that bug, and so to test new versions of i3, I had to propagate them to Nix, to Nix source configuration.

15:04

I mean, you all already had something like that, where you want to test something and it's very complex because it needs to go into the Nix source configuration, and how do I do that? My typical debug session looks like that. I use Nix shell, I try to patch, see if it compiles, etc.

15:23

If I can do that, then I extract a diff, and then I generate an overlay to build a new package, and I check that it builds with Nix build. Then I insert the package in Nix OS, and just I rebuild Nix OS, and then I restart, I test,

15:41

and if it works, I mean, that's fine. If I can reproduce the bug, it's not that good. And if it crashes everything, then I need to roll back, of course, and then back again because it's a loop. I mean, when you are trying to debug something, you have to insert debug statements, then maybe it starts GDB, stuff like that, so it's quite complex.

16:01

I mean, it would be so simple if I could just compile i3 and say, okay, put it in the store and use that. I know it's not clean, but then it would be easy, right? And there are solutions to do that. I should maybe not say that. Maybe we'll need to cut the video at some point. But it is possible to mount the Nix store read-write.

16:25

But yeah, okay, you should not do that. A more clean but very hacky solution is to insert a seam link into the Nix store, make a fake derivation that's just a seam link to somewhere else, your own directory or your project directory.

16:40

Then there are some technicalities with self-loops and stuff like that, but it's possible. And yeah, you can go straight to the solution and say, okay, this service uses this package. Okay, I can override the service, the Nix store service, and say, okay, use the package in my own environment. That works too. But then, I mean, it's not what we want. We are used to something that's pure, that's clean,

17:01

and then we are trying to hack everything. So this is indeed very hacky. So I went to look for tools that allow to do that, and one of them is git rebase interactive. When you do git rebase interactive, it tries to apply all the commits in turn, and when there is a conflict, you are dropped in an environment that looks like it's a normal git repo,

17:23

but it's not really. If you try to push that on a branch, it will be strange, et cetera. But you can commit, behave like you are in a normal commit environment, and when you are done, you say just continue, and it just continues its job, integrating your modification. That's something that we would like with Nix Shell.

17:42

Basically, Nix Shell set up the right environment to build the stuff. You can build it, you can go to the end, but when you are doing make install, it doesn't install because you cannot access the Nix store. Why? It would be interesting to be able to write in the Nix store at that point

18:00

and produce a package. And if you are annoyed about its purity, then just make Nix Shell accept the hack option, and the hack option produces a derivation with a random input parameter, so it doesn't conflict with the normal one, the pure one, and this random input says, okay, it represents the fact that a human can just tamper with the build,

18:22

so the human can add random stuff into the build. But that would be very, very nice because then we could insert basically anything in the store in a way that's quite clean and corresponds to the Nix model. To explain that differently, I have defined the Nix OSI model,

18:42

which is not to be taken too seriously, but we have layers, and each of these layers are related to different tools. So if you use Nix Builds, then you know nothing about Nixos, basically. If you use Nixos, then you know nothing about different servers like Nix Hubs does.

19:01

And if you use Make, then you know nothing about everything that's in Nix. Nowadays, it's nearly impossible to just run Make on a project. You need a Nix Shell to get the right environment, so basically what you do is you start at the Nix Shell level, you open the zipper, and then you enter an environment

19:22

where you can just play with i3 and your project, and when you are done, you cannot use that package, but it would be nice if you could just close the zipper and say, okay, I have a package now. I enter it, and I can exit it. Of course, you can go further and say,

19:43

okay, let's add a hack command to Nix Rebuild, to Nixos Rebuild. And then you would start hiring the hierarchy, zip inside it, and then go outside of stuff. So it would drop me, for example, in a shell where I can just edit i3,

20:00

and then it would use that version of i3 only for that build of Nixos. It's not that clean, but in the end it's correct because it's a version with a random input which corresponds to the fact that I've changed something, and I can easily enter and get back off the tree.

20:20

Okay, now for the extra step, if you remember caching, then you can still use caching in that. If what I did in the Nix shell corresponds to what will be done by the Nix build, then there is no reason the Nix build could not reuse that. If I was using correctly the API to the cache, and that Nix build cached my build, then it could be reused.

20:42

So this is very efficient because we never, never compile the same thing twice. If you compile it twice, then what? What happened? Oh no, crap. If you compile the same thing twice, then there is a bug.

21:04

This should never happen, right? Okay, so this is the part about, you know, having a nicer interface. Strict is good, but sometimes too strict is just, I mean, it gets in the way.

21:21

Okay, so I think we can go a bit further and improve the interface of Nix to play nicely with users. Yes, and this is the second part of my presentation, so if you have questions on that part precisely, I would be happy to take them.

21:44

Then I have a question for you. Would you use a tool like that? And I have another question. Do you think it's easy to implement? Ilko, any idea?

22:01

Okay. I just wanted to point out that in some way, there are already some hacky ways how you can, roughly what you already mentioned, so for example, some larger Haskell builds, the problem that building them from scratch every time with a couple hundred packages might make them a bug, so one way to work around it, for example,

22:26

is to, usually you're supposed to add build inputs, like SRC, right? You're supposed to actually add only your sources, but you could claim a standard way to get this kind of caching concept, as you have shown, is that instead of just adding the sources,

22:44

you also just add build directory as well, right? Because if you have a correct build system for your language, for example, and have build directory present, then it will resume from where it left off last time. You're basically hacking your way in by saying your sources are not just the sources,

23:03

but actually the sources plus the build directory. That already works, but of course, it's pretty hacky enough. The build directory can be rather large. Every time you do this kind of build, you create lots of stuff in the next store, so it would be really nice to have an official way to do that, as you suggest,

23:20

that you have the same benefits, but without the drawbacks of that kind of idea, that this concept of extra data that is like source input, but also like cache input, assuming that the build system is hacked, that might be a much cleaner solution. Yeah, you didn't speak about that, but if you want to build a local project,

23:43

then more often than not, every time you call a next build, then it will fetch the sources and put them in the store. And depending on your disk, this may be a quite long operation already, and then you still need to build it from scratch, and this is stupid. So one solution is to add the extra input, like you say, but then when you do it, next build will never acknowledge that it was the same thing that you built before.

24:06

So there is no way to make this work efficiently for building, and you will never be able to share what you did. But, okay, I think we're good in that.

24:27

Yes, yes, exactly. Okay, I will add that to my do not do that slide. Come on. Yeah, there is a question also in that environment.

24:41

Would you be allowed to access the network or not? And I guess we could have different versions of your and correct stuff. Okay, so next user story, and this may be a bit more technical. I have a problem with the next store. It's not a big problem, but it annoys me most of the time.

25:03

In my next store, there are so many packages that are just basically the same. If you take, for example, two popular data and you diff them, the only difference is in the package config file, okay, and the package config file, and there is only the popular data here and C flags.

25:25

And why are they different? Because they reference the derivation itself, okay? So this is the out hash of the derivation. It's not something, no, it's not an input that changed, okay? It's only the out hash that has changed, and that was written in the derivation.

25:47

So what does that mean? It means that because we gave two different out hash to this derivation, they end up being different. But if they had the same out hash, they would be the same. These are exactly the same derivation, except that this one is stored in a different location than this one,

26:02

and this must be written into the files there. Okay, so this is what it looks like when there is no optimization, right? Oh yeah, sorry, it's i3, but it could be any package. You have this derivation file, and then there is a very small difference,

26:25

maybe an input change, but it's not important to the derivation. It doesn't change what gets built in the end. And so you end up with two different packages in the binary cage. You will need to download this package two times if you use both, and then you end up with two packages in your store,

26:40

and they are nearly identical. So the first optimization, who is using an optimized store? Okay, quite a lot of people. There is this nifty tool that's called disk usage, and for a long time I did not understand why it results like that,

27:03

but in fact the tool is very smart. So it only counts the space that it has not counted yet. Okay, so what happens here? I ask for the space of three popular data derivations, and the first one took 12 megabytes, and the others are only 60K.

27:23

This is because the optimization of the store hard links every file, the files that are the same together. So all the files that were already counted for this derivation are not counted anymore for these ones. This means that basically there is nothing in here.

27:40

It corresponds to our intuition that there is only one file that changes, and in these derivations, these three derivations, you have folders which cannot be hard linked, and hard links to the same files. So these are only counted in the first one. What does it look like on my picture? It looks like this.

28:03

Right, you are able to optimize, use only the store usage, but you only know that in the end when you have downloaded it twice, built it twice, so it's not very efficient except for disk space on your machine.

28:20

And yes, network usage is important. Every week or so, we merge the unstable branch into master, and unstable branch is the branch where you have mass rebuild stuff. Mass rebuild means that mostly at least half of the packages are impacted. There is a huge impact.

28:40

If you have three changes like that, you can be sure that you can download everything. So if you are like me and that you update once a month, then every time you update, you need to download the full new distribution. You need to download all the packages. And I think we can do better than that. I mean, we do not need to download everything.

29:02

Most probably, there is not a big change, like a small change like we saw before. To do that, it's an idea that I would not implement, but apparently it existed at some point. We could invent this idea of a binary diff, right? Okay, this derivation, it looks like the previous one.

29:22

Okay, it's a bit different, but let's just only ship the diff and keep the diff into the binary cache. Hydra could do that. It's just a different substitute that will be provided, and it needs to be understood by Nix, but it looks quite easy to do. One question is, okay, what do you diff with?

29:41

Which is the old package? But this would help a lot to save bandwidth and space on hydra. But there is another option that I prefer, and that I will call content address storage. Okay, because we all call it intentional store,

30:01

and it's exactly not intentional. It's exactly the opposite of that. So let's keep with content address storage, okay? Walt Mertens made a RFC for that, and he's not here, I think. So I'm a bit sad because I wanted to work on that with him, but anyway. The idea is that if you can detect

30:23

that you build exactly the same package, up to the self-links, then you can just use the same package, and you're done. It's very nice because when you query hydra about, okay, what's the substitute for that? Hydra will say, it is that one,

30:41

and then you will check in your store and say, okay, I already have it. That's done. There is nearly no network, no bandwidth, and there is a lot of space that is saved on hydra. So, yes. What was the situation before?

31:02

If you have this standard environment, and then, okay, from that you can build X packages, and then you build I3, okay? You end up with three different packages in the end, and then there is a very small change to the standard environment. Maybe it's not significant, or you just added an S in a comment, in a bash script.

31:25

Nobody does that because we know that we will break a lot of stuff, but it's not that important. So I do a small change, and I end up with a different set of packages. With the content address storage, I could do something like that, okay?

31:41

I realize that, okay, still the end is exactly the same, so I can deduce that all the packages will be the same because they will be built from the same package. There is no way they can change. The actual input to this package is the same.

32:01

You build it, and you look at the bytes, and you say, okay, these are the same bytes. It's the same package, yeah?

32:20

It needs to build it. So there is no way around that. You need to build the package, but once it builds, yeah, I'm not changing the division. I'm, you know, basically changing the build command or something like that.

32:41

But you may be surprised. There are a lot of cases where this happens. For example, if you change GCC, for example, then GCC will rebuild your project, but it may happen that it's the exact same project that is built. Okay, it's a new version of GCC, but there is not much change in the bytecode that's produced, right?

33:01

So you may have that kind of behavior more often than you think, but I must admit that I do not know now what's the actual impact of that, so it would be nice to do some kind of impact analysis. Yes?

33:21

Mm-hmm. Mm-hmm.

33:44

I'm happy that you asked the question because I went too fast over that. So the idea is that for the output yourself, you need to strip that from the produced binary before hashing them, and this is the hash in which you will store your derivation.

34:02

Okay, then you can reinsert the outlink. Yeah, yes, of course. You need to do that, and it's more complex than it looks like because often the output ends up in very strange locations, for example, in a man page, and that man page may be zipped, so it may not be discovered by next build

34:22

when it strips stuff and reinserts them. So there is technicalities there that we need to think about. Yes, it's linked to that, of course. So, I mean, it's interesting to see

34:40

that, okay, let me change the derivation in some way that I'm sure cannot change the build. So, for example, I could add a random variable to the environment that nobody should use, right? And if I'm not able to produce the exact same build, there is a problem either with content-adversed storage detection

35:01

or with reproducibility. So it will maybe help to highlight, yeah, not stable package. Okay, so the main advantage is that you do not propagate change to other package.

35:22

You are able to better detect when the package is really the same as before. So it means that you can do less compiling. You do not need to compile all the dependent packages. You have faster updates. You don't need to download from Hydra. It is not too difficult to implement.

35:42

Yeah, not too difficult. But, I mean, everything is in there. We could do that with existing tools. The real difficulty is with, you know, forward and backward compatibility. If we do that, then, yes,

36:02

of course we'll change Nix in a major way. So we need to run just to get all. We can do that without breaking everything. Or if we need, if we break everything, maybe we need to say, okay, this is very important. It has huge advantages. So, okay, let's break Nix and start again over. But I don't know yet. So this is really the problem now with this idea

36:21

that it's quite difficult to know what would be the real gain. I would like to set up my own Hydra and try to do tests with that. But, I mean, it has a huge computing power, which I do not have yet. So, maybe in two years for next NixCon.

36:44

Yes, the reason why I was really interested in that is because I have this pull request that is now completely bit rotten. But never mind. I wanted to make a standard environment that just strips everywhere, everything.

37:02

Everything that looks like an ELF, strip it. Okay? Because we had issues like, for example, PHP was storing libraries in very strange locations. We did not see that. And so PHP was basically depending on GCC and, I mean, this was blowing the closure size.

37:21

For the story, this was, I mean, this was there for so long that at some point, Elko fixed the PHP stuff before I even merged this one. It was easier to fix by hand than to fix it globally. The interesting part about this pull request is that for most packages, it changes nothing.

37:42

And if the package changes, then there was something that was not stripped before and that we stripped after. So it's interesting to look into that package. And even the ability to detect when packages are exactly the same would help with refactorings. You refactor something, you build everything,

38:01

and you see what has changed. This gives you some information about did I do something wrong? Is this something that I really intended in the refactoring? Given that the refactoring should just be lifting as is. Okay, so now if we want to bridge the gap between all the ideas,

38:21

if we do this content-advised storage and caching inside the build system, then it becomes quite complex. If you cache build, then it becomes much less, it may become less stable or it may become more stable.

38:41

So you have a lot of interactions in there that needs to be investigated, and I think it would be fun. That's maybe a bit complex, so let's do one step at a time. And it would help, like we said before, to catch unstable builds. If the content hash changes too often,

39:01

then there is something that's not stable in the build. Okay, so I presented these three ideas of improvement to Nix. When I started, I had the feeling that you could not change Nix, but it's not true. There are a lot of things that we can improve and we can make it more user-friendly. About caching builds,

39:20

that's something that I'm working on for my PhD. The content-advised storage is something for which there is an RFC, so if you want to work on that, you're welcome to contact work merchants and we can do it together. The Nix shell hack with the zipper is something that's not being implemented at all, but I would be happy to start that with someone.

39:45

And that's it.

40:01

Behind the pillar. Over there, over there. The funny thing is that I've been compiling build systems to other build systems for two months now,

40:22

but not to Nix. I guess what I want to do is basically the same, right? Because I asked the build system to just say what it wants to build and pass it to Nix. So he's basically exposing all he wants to do. So I would like the build system to not compile to Nix,

40:43

but to compile to a standard protocol that could be used for their operation with any other tools. I would like something that's more global than just, okay, this to Nix, this to that, et cetera. I want just one common protocol to build stuff.

41:04

Okay, okay.