Rethinking basic primitives for store based systems
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Anzahl der Teile | 28 | |
Autor | ||
Lizenz | CC-Namensnennung 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/61035 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
NixCon 20224 / 28
10
12
20
21
22
24
25
26
27
00:00
Funktion <Mathematik>Konfiguration <Informatik>WidgetBetrag <Mathematik>GraphAssemblerBildgebendes VerfahrenCodeComputerspielDatenstrukturDynamisches SystemGraphInformationMathematikNatürliche ZahlOrdnung <Mathematik>SoftwareStatistikFunktion <Mathematik>DateiverwaltungHalbleiterspeicherProgrammbibliothekMAPDarstellung <Mathematik>TropfenBandmatrixGrenzschichtablösungKonfiguration <Informatik>Physikalischer EffektGlobale OptimierungBenchmarkBinärcodeBinder <Informatik>BitIdeal <Mathematik>InterpretiererKomplex <Algebra>LastOrdinalzahlPhysikalisches SystemRechenschieberResultanteSpeicherabzugVirtuelle MaschineZahlenbereichFlächeninhaltSystemaufrufKonfigurationsraumVersionsverwaltungReelle ZahlÄhnlichkeitsgeometrieServerSpannweite <Stochastik>ZeitrichtungParametersystemCoprozessorCASE <Informatik>Prozess <Informatik>ProgrammfehlerVerzeichnisdienstBinärdatenAbstraktionsebeneDistributionenraumÄußere Algebra eines ModulsVererbungshierarchieKontrollstrukturNegative ZahlOffene MengeInformationsspeicherungKartesische KoordinatenPrimitive <Informatik>Mailing-ListeDateiformatHierarchische StrukturMinimumExplosion <Stochastik>Elektronische PublikationBootenEndliche ModelltheorieMini-DiscPatch <Software>NeuroinformatikWeb logObjekt <Kategorie>ARM <Computerarchitektur>sinc-FunktionAuflösung <Mathematik>Einfache GenauigkeitMultiplikationsoperatorSoftwareschwachstelleURLStandardabweichungTVD-VerfahrenZweiCachingRechter WinkelNetzbetriebssystemGamecontrollerSoftwareentwicklerEinsComputeranimation
Transkript: English(automatisch erzeugt)
00:10
Hi everyone, thank you for attending my talk. Sorry I couldn't be there with you in Paris, maybe next year.
00:21
I'm very fortunate to have you. Let me give this talk remotely and I should be online to answer any Q&A at the end of the session, live. So the topic of my talk is Rethinking Basic Primitives for Store-Based Systems. And this is some work put together by myself, Tom Skoglund and Carlos Maltzen.
00:43
And we're out of University of California, Santa Cruz and Lawrence Livermore National Laboratory. So this talk's kind of a story or maybe even a call to arms. By the end of this talk, I hope to have sold you on a few things.
01:01
And most importantly, that perhaps we as a community have not taken full advantage of the Pandora's box opened by Ilco. Actually that analogy isn't that great since Pandora's box I believe led up to spare and pain in the world. Well, maybe that's a little bit what it's like to learn Nix,
01:20
but all joking aside, what I meant to say is that Nix and similar store-based systems have ushered in this new paradigm and they've unleashed a lot more opportunities than we've actually taken advantage of. I've included some fun Dolly art here
01:42
with the also prompt I gave. So yeah, I thought I'd start off with maybe a quick depiction and where were we prior to the invention of Nix?
02:00
You know, maybe to many of you this won't be new since this is obviously a Nix convention, but I wanted to set the stage for those that are still coming and that are new to Nix and maybe you might watch this online. Well, up until Nix, the world has largely adopted shared libraries to solve a variety of problems. Shared libraries are great in that there is a single library on the system
02:21
with which all others link against. That makes it incredibly simple and easy to upgrade libraries in the face of security vulnerabilities. In a world when bandwidth and storage are at a premium, coalescing onto a single file and shared library at a particular version was efficient.
02:45
Of course, and perhaps this will ring true to many of you and why Nix itself is so effective, sharing these libraries was incredibly risk prone and a compatibility nightmare. You know, Nix, as I described to many of my colleagues and friends, fundamentally solves the it works on my computer
03:03
or it doesn't work on my computer, but it works on yours. This problem has only really gotten worse with time as the software we've built has become increasingly more complex and dependent on others. So shared libraries. They're the cause of and solution to all of life's,
03:24
well, our life's problems. This garbled mess here is the build and runtime dependency for the Ruby, C Ruby interpreter. If you're really struggling to make sense of the graph,
03:40
great, that's my intent and by design. Really the purpose of the slide is I only hope to communicate that our software has gotten increasingly complex and the use of shared libraries and dependent code in general has exploded. So the problems in the previous slide have only gotten worse and are exacerbated
04:01
by the widespread use of shared libraries. Of course, the rest of the world has dealt with this explosion through established conventions, specifically the file system hierarchy standard. These are the directories you've either come to love or hate, maybe you have them tattooed on your arm.
04:22
Libraries are founded slash lib or lib64, executables in slash bin and configuration files in slash etsy. This convention underpins most Linux distributions and is maybe at the core of what Nix is attempting to resolve.
04:40
If you read the next PhD thesis, which I recommend you do, the main contribution to me is that we should treat our file system more like well-structured objects and less like a global mutable file system. But there is something to be said,
05:00
this convention has gotten us pretty far. It's well-established, well-understood and simple. So hopefully most of us are familiar with this slide as we're at NixON, but basically it's an extremely short summary
05:22
of what Nix is and it's an alternative to the world of that convention I spoke to previously. Rather than everything being placed within the system implicitly, everything must be placed within a well-known route, the Nix store depicted in the graphic on the left.
05:43
And dependencies can only exist within other entries in the store in a very explicit manner, also depicted by the arrows. In the case of shared objects, Nix utilizes a feature of the executable and linkable format, aka ELF,
06:00
to specify a configuration parameter known as run path, which dictates where the dynamic linker is allowed to search for dependencies. Run path, like most things in Linux, is a single list to begin a search, very similar to path or LD library path.
06:22
I've included an example on the right of Ruby. This is continuing on the example of using Ruby from the previous slide. And I've printed out its run path. You could imagine in the graphic on the left as well, subversion here uses open SSL
06:43
and glibc. Those two entries of the directories would be on the run path, which is colon-deliminated. Okay, so problem solved. Perhaps at this point, we're all thinking, great.
07:02
Although Nix solves a lot of the initial challenges faced by models, such as the file system hierarchy standard, it effectively still relies on a lot of many of the fundamental concepts introduced by that world, and at times appears only bandaid over them.
07:22
Lots of aspects of that are great. It's pragmatic. They're honestly what helped make Nix OS and Nix into a usable system right away by using primitives that already existed. But as our ecosystem begins to mature, we should be asking ourselves more fundamental questions
07:40
about these underlying systems we leverage, and whether there are opportunities to reshape them in the new store model that Nix has ushered and introduced. I'm gonna spend the next couple of slides walking through some issues that are present in the underpinnings of what Nix builds upon and some ideas of how they could be repaired.
08:12
Issue number one. Shared objects are only loaded into memory a single time during traversal, usually based on their soname. If a shared object has already been visited
08:22
and is needed by another dependency, it'll be provided without a lookup. Great, that's an optimization, and it'll not raise a warning or error, however, if that library would not actually have been found. We found several cases through discovery and anecdotal data
08:40
through our own work and experience where this happens in practice. The image I've put above here is one such example of a tool called dbwrap. You can see that libsamba-debug-samba-4, and that's where the arrow's pointing to near the bottom, is a shared object that was loaded earlier,
09:01
and the loader goes in breadth-first order, or level order, so it loads that first. However, when it came to try to load it again, and that's the highlighted blue, it actually should complain that it cannot be found, and that's because its parent was missing the run path entry to locate it.
09:25
The tool, however, runs fine because of that optimization of skipping trying to reload shared libraries we've already loaded. This issue is trying to expose the idea that although Nix aims to be incredibly reproducible
09:40
and document all edges of our graph, it's possible to actually get to running software while still missing these edges. Here's somewhat of a paradoxical example of an application that may want to link against two libraries,
10:02
library A and library B, or lib A and lib B, found in directory A and directory B. So I've highlighted them in green in this graphic of the desired libraries to load. Unfortunately, given the coarse nature of run path, and this problem applies
10:23
to also LD library path as well, there's no way to do this in the current model. The only two ordering options that exist for these directories I've put on the slide. It's either directory A colon directory B or directory B colon directory A.
10:40
And neither of these orderings will result in the desired outcome. If directory A is in the front, then library A and library B will both be loaded from directory A and vice versa if directory B is at the front. You know, at best, a new directory structure must be created that's symlinks to the correct library,
11:01
but that adds unwanted complexity to our software setup. In this issue, Nix is still relying on primitives that predate to control the system, and those parameters do not offer enough granularity and control for what really Nix or any highly fine-grained system requires.
11:26
Another issue which has also been written about by the developers of Geeks are the needy executable binaries. Binaries have gotten a lot more complex as I've discussed earlier, and the number of dependencies they rely on
11:40
have gotten increasingly large. Here's a notable example which many of us are familiar with. This editor, or maybe you refer to as an operating system Emacs, has 100 dependencies found in 36 directories that comprise the run path. This provides a worst-case scenario where if all dependencies are found in the last directory,
12:01
causes a lot of unnecessary file system calls, you know, hypothetically 3,600, as I've written on the slide. You know, in practice, it's actually a bit less, however still significantly high, 1,800 file system calls, which can be pretty disastrous if these paths are in a network file system, for instance,
12:21
and they have negative caching disabled. You know, this is overall not a new problem, and traditional Unix systems employ something known as the LDSO cache, which stores the location of a library to its path on disk. Again, this was however designed as a system-wide cache,
12:41
and breaks down in a system such as Nix, where multiple variations of a library can exist. So that's about the issues present. Now I'm gonna propose some solutions. Some which we've implemented already, and some more as a call to arms or ideas I'm exploring.
13:06
So the first idea is we've called shrink-wrap, and it's the idea of flattening the graph and pointing all libraries to their absolute path. Again, the tool hoists all needed entries
13:21
of the final binary to the top level. So I've shown a graphic here of the Ruby application, and it has a typical directed acyclic graph. And after the transformation, all the dependencies are duplicated at that first level, and are now pointed to by their absolute path.
13:43
Because of the caching and duplication in glibc, it means that all libraries are effectively loaded from deterministic locations, and in a deterministic order at startup. We no longer have to rely on runpath or rpath for loading sub-libraries, except for dlopen.
14:04
This is great, loading of needed entries is now deterministic and well-known, and we can avoid that costly lookup that I spoke to about earlier. I'll actually be presenting a paper on this at this year's Supercomputing in Dallas, Texas. If you're interested in the paper, please reach out,
14:21
and I can give you an advanced copy. Interestingly, Geeks chose a completely different approach to achieve something comparable, and they patched the dynamic loader to make that LDSO cache per process, rather than system-wide. I think there's a lot of simplicity here, however, in seeing all your dependencies up at the first level,
14:46
and avoiding any need of a cache. Here are some results of the experiment. I chose Emacs because the Geeks developers posted a blog post and used that as their experiment binary as well.
15:01
The number of library dependencies needed by Emacs dropped, sorry, the number of library dependencies needed by a particular binary, and the number of entries in the run path can vary greatly. However, in the case of Emacs,
15:21
we see the ideal drop after shrink-rack has been applied, so about that 1,800 file system calls to open in stat down to 104, and the time reduction, and this was just done on my laptop, so it's in microseconds. However, we see a 36x improvement.
15:43
Again, even though that was in microseconds, these times can be inflated on an NFS machine with cold cache, and these are time spent on every process startup, on every machine that's running Emacs in the next model. The bottom graph is of the pynamic benchmark.
16:03
It's a dynamic loading benchmark built basically exactly to measure this startup cost of loading many shared libraries. The experiment was performed on the machine where the shared libraries were in fact on an NFS server with a cold cache and negative caching disabled. At the smallest size of 502 processors on four nodes,
16:24
the normal executable took about 169 seconds to launch, while the wrapped executable, or the one applied with shrink-wrap took 30 seconds, and that's a speed of about 5.5. At 2,048 processes, the gap widens to an improvement of 7.2
16:43
for a total time to launch of 344 seconds for the normal executable. So these are real savings and performance improvements. The idea which originally came from Harmon Stoppels
17:00
is that soname of a shared library is that you set the soname of a shared library to its absolute path. I've called this idea nix-hardened-needed, and it's basically bubbling up the world explicitly. Again, so you set the soname of the library
17:22
to its explicit path, its nix-store. Anytime you link to this library, the needed entry becomes the soname, so you're automatically propagating the absolute path, and we've just bubbled up complete removal for not needing run path anymore.
17:41
Since in Nix, it's always safe to link against the full nix-store path entry, this removes, again, the run path or our path entries completely automatically. Things just build correctly. It's incredibly simple, pure, and elegant.
18:02
Here are some, now that we're getting into the ideas I am thinking about and don't, you know, more call to arms, and I'd love to collaborate, so ELF has basically dominated the Linux executable format, especially since the deprecation of a.out or the assembler output format.
18:22
ELF itself has many of its concepts underpinned by those same conventions that dominated the FHS, or file system hierarchy standard. For instance, it encodes all the relevant information within a single file. Of course, this is desirable when our goal previously was to update only a single shared library file,
18:40
and we wanted to guarantee some levels of atomicity. However, since Nix solves the multi-file atomic update problem, why should we accept that? Why should we accept that all data needs to be encoded in a single file? Understanding and working through this data is incredibly challenging.
19:01
One only has to look at Patch ELF, which was written by Nix for Nix, and which we rely on heavily, and the bugs against it to see the problem. I don't really know of what another layout could be. I give an example here where I basically just explode ELF into individual files,
19:21
but maybe files isn't even a good abstraction. What happens if some of this information existed in a database, a SQLite database? Again, the fundamental idea here is Nix allows us to use a different loader, and it allows us to think much more deeply
19:42
about these fundamental primitives, and to issue them. Finally, Nix needs its own dynamic linker. If we were to avoid using ELF,
20:01
we basically need our own dynamic linker. Doing so opens, and maybe even our executable format. Doing so opens a wide range of opportunities, areas of research, optimizations, and we won't be so constrained by the conventions
20:21
that were placed before Nix was created. Writing a dynamic linker is no small feat, but Nix is primed to do this. Not only does it have an operating system distribution so we can make deeper changes if we need to,
20:40
but the opportunities here for us to also write one for the typical Nix distribution as well. I'm kind of ending my talk here with another Dolly image, and this was an oil painting of a dynamic linker for Nix. Clearly it went very abstract, yeah.
21:02
Thank you, and that's my talk, and hopefully I see you live for some Q&A. Thanks.