Path-agnostic binaries, co-installable libraries, and How To Have Nice Things
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Anzahl der Teile | 50 | |
Autor | ||
Lizenz | CC-Namensnennung 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/43117 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
All Systems Go! 201821 / 50
1
4
11
12
13
14
16
17
19
23
24
25
29
30
32
34
35
39
40
41
43
44
50
00:00
Minkowski-MetrikSystemprogrammierungBinärdatenSoftwareOrdinalzahlGruppenoperationDatenverarbeitungssystemProgrammbibliothekWort <Informatik>BinärcodeInstallation <Informatik>Arithmetisches MittelPhysikalisches SystemSoftwareGeradeCASE <Informatik>Digital Rights ManagementAuflösung <Mathematik>Fahne <Mathematik>Quick-SortBitComputerspielBesprechung/InterviewXMLComputeranimation
01:08
SystemprogrammierungOrdinalzahlGruppenoperationDatenverarbeitungssystemBitPunktComputeranimation
01:28
ProgrammVersionsverwaltungSystemprogrammierungInstallation <Informatik>Virtuelle MaschineBinärcodeStandardabweichungMini-DiscOpen SourceDateiverwaltungSkriptspracheProzess <Informatik>BinärdatenElektronische PublikationObjekt <Kategorie>MedianwertVollständiger VerbandE-MailProgrammbibliothekMultiplikationElektronischer FingerabdruckKonfigurationsraumSpezialrechnerDefaultBetrag <Mathematik>Fahne <Mathematik>Inverser LimesProgrammbibliothekGüte der AnpassungCheat <Computerspiel>QuaderInstallation <Informatik>HilfesystemProgrammVersionsverwaltungElektronische PublikationBootenBildschirmmaskeVariableProgrammierumgebungWort <Informatik>Mobiles EndgerätBinärcodeForcingRechter WinkelPhysikalisches SystemVirtuelle MaschineRoutingRelativitätstheorieStandardabweichungGanze FunktionBitSkriptspracheEnergiedichtePunktSoundverarbeitungDateiverwaltungDistributionstheorieEinfache GenauigkeitImplementierungAdditionKategorie <Mathematik>MathematikObjekt <Kategorie>E-MailWurzel <Mathematik>Binder <Informatik>Mini-DiscFahne <Mathematik>Formale SpracheTermNP-hartes ProblemVerzeichnisdienstOpen SourceMinimumZweiPhysikalischer EffektVerschlingungWrapper <Programmierung>Lesen <Datenverarbeitung>Dynamisches SystemMultiplikationsoperatorComputeranimation
10:48
Fahne <Mathematik>ProgrammbibliothekBinärcodeBetrag <Mathematik>Objekt <Kategorie>DatensichtgerätDatenstrukturVerzeichnisdienstTermKategorie <Mathematik>AlgorithmusDatenverarbeitungssystemKonsistenz <Informatik>VersionsverwaltungKontextbezogenes SystemZahlenbereichKonflikt <Informatik>Normierter RaumDistributionstheorieFlächentheorieIndexberechnungHash-AlgorithmusQuellcodeSystemprogrammierungSpezialrechnerSkriptspracheMotion CapturingCompilerInstallation <Informatik>MaßerweiterungInstallation <Informatik>Ordnung <Mathematik>DatenverarbeitungssystemSampler <Musikinstrument>MereologieBildschirmmaskeCompilerGemeinsamer SpeicherKette <Mathematik>Deskriptive StatistikProgrammbibliothekSelbst organisierendes SystemVersionsverwaltungKategorie <Mathematik>DatensichtgerätElektronische PublikationVerschlingungAuflösung <Mathematik>MultiplikationsoperatorDynamisches SystemVerzeichnisdienstObjekt <Kategorie>ComputerHash-AlgorithmusPunktWort <Informatik>IntegralQuellcodeFunktionalProjektive EbeneBinärcodeInhalt <Mathematik>InformationsspeicherungInformationWeb SiteKonstanteOpen SourceNormalvektorComputerspielABEL <Programmiersprache>ZweiMotion CapturingRechenschieberNabel <Mathematik>Physikalisches SystemSkriptspracheDifferenteBildgebendes VerfahrenFreewareAdressraumComputeranimation
18:48
ProgrammbibliothekÜbersetzer <Informatik>Konfiguration <Informatik>Patch <Software>E-MailPhysikalisches SystemBetrag <Mathematik>BinärcodeObjekt <Kategorie>FaserbündelDistributionstheorieVerschlingungPhysikalisches SystemBinärcodeProgrammbibliothekVersionsverwaltungPatch <Software>E-MailInhalt <Mathematik>GrenzschichtablösungQuick-SortDeterminanteMultiplikationsoperatorSampler <Musikinstrument>DifferenteSoftwareZweiMereologieDigital Rights ManagementComputeranimation
21:31
Objekt <Kategorie>DistributionstheorieSystemprogrammierungBinärdatenInhalt <Mathematik>AdressraumVerschlingungInformationsspeicherungPhysikalisches SystemBinärcodeBildschirmmaskeTermPermanenteSelbst organisierendes SystemGemeinsamer SpeicherSoftwareInhalt <Mathematik>ProgrammbibliothekMultiplikationsoperatorAdressraumComputeranimation
22:51
SystemprogrammierungMKS-SystemMultiplikationsoperatorTermXMLUMLBesprechung/Interview
23:16
Systemprogrammierung
Transkript: Englisch(automatisch erzeugt)
00:07
Hi, I'm Eric, and I'm here today to talk to you about path-agnostic binaries, co-installable libraries and how to have nice things. If those words don't mean anything to you yet, that's okay, it's because I just made
00:22
them up. This talk is going to be about introducing some terminology and more generally it's about how software is packaged and how that could be easier. So the first thing I'm going to do is talk a little bit about how packaging is not a solved problem, just in case anybody thinks it is, then in the middle introduce
00:41
some terminology to help talk about things we could improve, then I'm going to talk about a bunch of existing systems and package managers with that vocabulary, and then last and sort of scattered about there will be some techniques for legitimately GCC flags that might make your life better. Some of this is line broken a little strangely because of resolution.
01:03
So packaging is not a solved problem. We really need the installation of packages to be easier, and we really need almost everything about the way we interact with packaging to be easier. There are a couple of big things that point this out to us. One of them I think is the rise of containers in the last couple of years in our industry.
01:24
This is something I talked about a bit more last year actually, and there's a whole talk about that that was wonderfully recorded that you should go see, but one of the big points from that talk was containers gave us the ability to install more than one version of a thing, and people liked this.
01:41
Surprise. So this is something that we can do now with containers, and it's caused a huge popularity of the system, this release of energy and enthusiasm I think indicates that we have room to go with the way we package things, and I also want to ask if containers are necessary to do that, and I think the answer is no. Containers are something that made it easier for us to install multiple versions of a thing
02:03
on a machine, and that's good, but this form of easier might not be the only form of easier in the world. Containers have some other baggage with them. The other thing I want to talk about that makes me think packaging is not a solved problem
02:21
is something I've been meditating a lot on the last year or so is that there are a lot of distros in the world, and despite the fact that we're all people trying to work together, we're all open source nerds trying to make the world a better place in some way, we have great difficulty sharing things, especially the binaries that we produce for many of
02:40
our distros, they're basically completely nonportable in any sense of the word. If I have an Ubuntu machine and a Debian machine, these have almost the exact same tooling, almost the exact same packages, but can I reliably copy a binary from one to the other? Maybe, yeah. Would I bet on it? No. If I have Fedora and CentOS, it's the same thing, they're mostly RPM and Yum and some
03:04
new acronym lately. Can I copy a binary? No. Even Nix and Geeks, which are two relatively recent Linux distributions and are extremely similar, they use all of the same build tools, all of the same linking conventions, can I copy a binary? Absolutely not. This is weird, because we're all trying to work together, so there's some quiet force
03:27
which is causing us to become vulcanized, like very small communities that are unable to work together, and it's happening without our intention, and I think we need to ask a lot of questions about this. And I think these actually, strangely enough, have some shared root causes.
03:47
Standards for composable installation is a thing that we don't really have. I don't think we have enough language to talk about what we mean with portability and composability, and I think we should work on that.
04:02
So here's an attempt, and here are some definitions. I think we should talk about the ability to co-install things, and the definition that I would offer for that is, any time I install one thing, installing a second version of the same thing should not be any harder than installing the first thing was.
04:22
And that includes using it, not just having it on disk, but being able to use it. This sounds trivial, but of course, nothing's easy. The other term that I want to introduce, again from the title of the talk, is path agnostic, and that means a user of a system, the person who is installing the thing,
04:41
not the packager, not the builder, should be able to decide where it goes. Any binary I have, I should be able to take the folder that that binary is in, use the MV command, and then keep using the binary. This should not be hard. Path agnosticism is also a really nice property, because it quite trivially gives you co-installability.
05:02
If I have a binary that I can move, then I can take other versions of that binary and put them in any path prefix that I want, and of course, it's trivial to install more than one version, right? And if we could do this, I think this would fix a huge source of that tendency towards balkanization that Linux distributions often find themselves in.
05:24
So there are many ways that you might try to implement path agnosticism, and something that I want to introduce early is things that you can do, and things that you should do are not necessarily the same thing. So for example, we already talked about containers earlier,
05:42
and containers, broadly speaking, are a form of cheating. They're a form of change routes, and this is something that works. But it's something that has a lot of additional baggage with it as well. If we use trutes as a form of packaging, well,
06:03
trutes don't compose very well, right? I can package precisely one thing in a change route, and then that's kind of it. I have to package an entire Linux file system, the whole thing, all the libraries, in one big old monolith. And this is problematic for a lot of reasons. It's quite opaque. The tools that I use to do this are going to have a large amount of side effects,
06:26
and all I'm doing is bundling them in one trute, and this doesn't help me understand, right? It doesn't help me diff. There's a lot of limits there. Another form of path agnosticism that you might be thinking about is setting up some environment variables, like somebody's probably thinking,
06:42
LD preload. That's a thing you can do, but I think it's very questionable whether we should do it, because this causes lots of wrapper scripts to show up. It also doesn't compose very well, because if you set LD preload or any environment like that that's trying to make things path agnostic, all the child processes inherit that too,
07:01
and that's probably not what you meant, and this just, it doesn't compose very well. It has side effects you didn't expect. So the kind of path agnosticism I think we should chase is having whatever is in your binary, in your file system, it needs to explain itself. It needs to be context-free without any other environment.
07:23
And this is kind of the harder one. So for some systems, this is easy. If you're statically linking a binary, you've only got one file, and making a single file thing path agnostic is pretty trivial. It's not looking for anything outside of itself, so you're done.
07:41
But let's say for some reason or another, we are convinced that we cannot statically link the entire world. So we're going to do some dynamic linking instead. Now if I have more files, things are getting a little more interesting, because if I have like one main binary in a package, let's say, and I have some other files around it, I need them all to be referred to relative to that main binary.
08:04
If I'm going to keep the property being able to MV the entire directory around. That's easy, right? No, not really. So let's talk about this a bit more for a second. What happens when you try to do this in practice with dynamic linking in the world as we know it?
08:23
If I look at how Bash is linked on my system right now, this is the readout that I get. LDD, a lot of people might be familiar with this, but if you're not, it's a thing that looks at which dynamic libraries get loaded when you execute this program. So in my system, this is what Bash does.
08:42
These are absolute paths. So right out of the box, we can very quickly see, because there's a slash here, this is not path agnostic. If I move Bash or if I move any of these libraries, it's not going to work correctly. So where does this come from? This is kind of a quick primer on how the dynamic loader works
09:04
for anyone who's not familiar with it already. So these absolute paths come from nothing in the binary itself. ReDLF is something that will read the executable headers out of the binary and tell you what it thinks of them. So here it's showing me the same library names,
09:21
but they're not absolute paths yet. The absolute paths came from somewhere further. For me, they come from this lovely place called a slash etc slash ld dot so dot conf. And this is, of course, another absolute path. And so now we finally hit rock bottom. These are all of the further absolute paths
09:42
that the linker is going to look at when I run Bash. So this is how this all came to be. So if we wanted something to be path agnostic, we would want our linker to be able to load these object files from somewhere else, somewhere relative to the binary.
10:00
Can we do that? Yes, it's just a little arcane. You might want to take a screenshot of this because where do you find those docs? I don't know, they're somewhere. But take my word for it, that's a thing you can do. And this would give you a binary in which, now if you read the headers, you'll see the same requirement for shared libraries.
10:23
And then this new flag appears, and read ELF is telling us it's going to look for this library run path, our path, that is relative to the path of the binary. And if I ask ldd what it actually resolves, it will do something relative.
10:43
So we can have path agnostic dynamic linking. It's not commonly done, but this works. This is a feature that's been in LD, in the thing that interprets your binary dynamic links for years, for ages, in every form of LD ever.
11:03
As far as I know, there are no Linux distros which use this commonly, but it's absolutely out there. Like, go run ldd on an electron binary if you've got one or three or more on your computer. It does this. So let's consider that whole problem solved.
11:21
What I haven't talked about yet is how we should actually organize sharing of objects again. So we can have path agnosticism. If we have path agnosticism, we can trivially have co-installability. And now let's talk about raising the bar even further.
11:41
We want path agnosticism and co-installability, and to be able to share things. So this requires us to do a little more organization. And there's more than one way to go about this, so I'm going to introduce more terminology. The word I'd like to use here is splay.
12:02
This is a word for, like, if you're selling something in a store, you're going to spread out things for display. So here I want to use the word splay to describe the way we spread out any shared objects or dynamic libraries in a bunch of directories in some organized way that we can reference.
12:24
There are, of course, more ways to do this than I can possibly count, but they can be grouped into some distinct categories. So these are the three major different ways I can imagine you would ever splay out libraries. The first one is what I'm going to call a precise splay.
12:43
And this is simply when I have some library and I want to know what path I'm going to put it in, and I'm going to hash all of the contents of the library and put it in a folder with the name of the hash. I'll probably use a cryptographic hash for this because why wouldn't I? This is probably sounding pretty familiar. We also call this content addressable.
13:03
This is a nice way of organizing information because it's completely automatic. It's basically immune to conflict. And so this, going back to the reason we're talking about any of this, a precise splay, something content addressable, trivially satisfies co-installability.
13:21
If I have more than one version of a library and I add however many more versions of library, I will never conflict. So this means I can automate everything with this organization. You have one of these on your computer. It's called Git. We tend to like this for all the same reasons. Because you can insert an unbounded amount of stuff
13:41
and it never generates a name conflict in itself, this is also automatically decentralized. Since you're using cryptographic hashes, you also get integrity checking for free. This is just a really good place to be. But it's not the only way you could imagine displaying libraries.
14:00
So another way that you could go is, of course, go full manual. Assign names to every file that you need more than one version of. You can do this. But another way of saying manual organization is basically you're always doing conflict resolution.
14:22
And so I think this is very difficult to say is co-installable. And this is, of course, kind of the norm. If you're thinking that this looks and sounds like my libraries on my system, yeah, it probably does. If you do an LS in slash USR slash lib on your computer,
14:42
you're going to get tons and tons of symlinks like this on most distros. If you're a Nix or a Geeks person, of course, you have a very different life. But on most distros, you're going to get this. You're going to get this very manual organization. And so if I was going to install a new version
15:01
of a library in here, I could give it a separate name using my human brain seconds. There's nothing automatic here. But remember, our definition of co-installable explicitly said not just have the files on my computer, but be able to use them. And these symlinks will at this point betray us
15:22
if we have a link which says library name dot so dot four and then it points to a more precise version. This is no longer co-installable then. If I want to install a different version, I can give it a different name. I can have the file here. But can I use it as easily?
15:41
No, not without performing active conflict resolution. So this is not co-installable. The other most interesting category of things you can do is what I'm going to call a property-based display. And this is if you calculate some property of the libraries you're going to share and then use that as our index.
16:04
Whether or not this is co-installable can be an interesting question. So we're going to go over a couple of examples of these in order to try to figure that out. One common form of property-based display you might have seen is anybody
16:21
who's doing things with Docker images, it's very common to have a shell script which when you're publishing an image, tags it with the source code hash. And this is something people do because the hash is already there because of Git, thank you, Git. And so it's very easy to do. But this doesn't capture a lot of things, right?
16:41
If I do my build again with a different compiler, that of course is not represented in my Git source hash. So that's not covered in my display then. So this, I would say, is again not co-installable. If I use a different compiler and I want to install that thing on the computer as the other thing with a different compiler,
17:01
I have conflict resolution to do. I'm going to skip this slide because I'm running out of time. So what if I got better at this and I came up with a description of a property where I have not just the source code hash but I have all of the other executables on my path, all of my compilers
17:21
as part of my property description as well. And this is what the Nix and the Geeks distros do. This is really cool. So this big hash in here includes not just the source code but all of the other tool chains that were used in building it. This is still distinct from a precise display, however, because that hash is not of the content.
17:41
This can still get in conflict. This would be equivalent to a precise display if we could assume that all compilers are pure functions and all compilers are deterministic. This is unfortunately just not true. There are some people in the room laughing. Yes, that's a whole nother talk.
18:01
There's a reproducible builds project and a reproducible build community out there who is working on this problem. And believe me, it's a problem. So if we want to share libraries, we can choose any of these categories of techniques. But if you ask me, please choose precise. It's by far the most correct.
18:20
So now I want to get all of these properties back together. I want to have path-agnostic and I want to have co-install and I want to have shared objects. If we could have some binaries that are path-agnostic and we could have a splay of all of their dependencies that is path-agnostic, then we could move both of these things around together and they would still be path-agnostic
18:41
and we would still have shared objects and everything would be awesome. But how? So since we just talked about Nix briefly, I want to use Nix as a further example because they do some interesting things. They use our path, much like the thing I mentioned earlier where we can use relative linking,
19:01
but they don't quite do relative linking. This is what you'll get when you read the ELF headers on a Nix system. There's actually several library paths in here you can see and they're joined by colons. This is cool because it's close to co-installable if you're ignoring the whole determinism of compiler's part,
19:23
but it's also not path-agnostic. This still starts with a slash. And any time there's a slash in a path, we've kind of lost. Some people say that Nix, in fact,
19:41
can be installed in any path, and that's sort of true. Can, should questions come up here a lot? These paths, as we saw, are literally embedded in the binaries. So if you're going to install one of these binaries from Nix in a different prefix path, if you're going to try to make a path-agnostic, you basically have to rewrite this header,
20:03
either by recompiling the whole thing or by using some tool that patches headers. So this is not path-agnostic. And going all the way back to the concept of auto-balkanization, this is a fascinating example because the Nix and the Geeks distros are almost the same,
20:23
except this path on a Geek system is different. It doesn't have slash Nix in the front of it. So despite almost everything about these systems being identical, that slash, it really gets in the way. So in the last 60 seconds,
20:40
because I'm not talking nearly fast enough, I have a new proposal. What if we compiled binaries with this rpath origin, we put all of our libraries relative to the binaries, but these could be symlinks. They can be the full content or they can be symlinks. This is actually the exact same thing.
21:01
You can switch back and forth between these versions of linking and never need to recompile this binary. So this is path-agnostic. And if you bundle all of the library full text and make a tarball of this, it's also path-agnostic. If you need to patch things on a system that is arranged like this,
21:21
you want to replace the library separately from your package manager, go ahead. It's just symlinks, this is easy. We should try this. We might be able to make tarballs of software, which we can distribute and run without needing a distro. If we want to share libraries, we can have a distro. We can have a form of organization, which makes it easier to do that,
21:41
but we wouldn't need it. One of the reasons I think this would be cool is if we wanted to dump a bunch of binaries into a content-addressable system for permanent storage and sharing, we could do that. If it's path-agnostic, you can mount it anywhere, it'll run. If I wanted to say add a bunch of things to IPFS,
22:01
I work with IPFS a lot, I could do this. This would work. But only if the compile is path-agnostic. This talk was a lot about C-style linking and I'd like to apologize for anyone who doesn't do C things. Imagine this with Python path, imagine this with anything else. All of the same principles apply.
22:25
This thing I offered at the end is just one possible way of arranging some symlinks and stuff. You don't have to love that solution. But I'd like to talk about these terms more and I hope that these are useful concepts for exploring how we compose software.
22:44
Thank you. Actually, no time at all for questions. I am so sorry. But one more quick mention.
23:01
There will be a hackfest later in the week if anybody wants to talk about documenting what these terms mean, trying to make more concrete manifestos, maybe tools. Let's talk later. Thank you.