Nix in a scientific environment bringing together Nix's reproducibility with computational chemistry
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 28 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/61022 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
NixCon 202216 / 28
10
12
20
21
22
24
25
26
27
00:00
AverageLecture/Conference
00:28
Virtual machineMultiplication signProcess (computing)Order (biology)Data managementGroup actionWorkloadScripting languageParallel portPhysicalismStapeldateiRun-time systemScaling (geometry)Physical systemSoftwareFourier transformLinear algebraBitBit rateModule (mathematics)Slide ruleNeuroinformatikCartesian coordinate systemMatrix (mathematics)Quantum mechanicsIntegrated development environmentDifferential equationElectric generatorLight fieldSimulationPosition operatorUniverse (mathematics)Well-formed formulaBefehlsprozessorMobile appLecture/Conference
06:50
ResultantRevision controlNeuroinformatikMultiplication signComputer hardwareScripting languageChainComputer programmingoutputBinary codeLecture/Conference
07:56
Category of beingCountingNeuroinformatikImplementationSource codeParallel portCoroutineLinear algebraMessage passingBitNichtlineares GleichungssystemTerm (mathematics)Machine learningSoftwareOverlay-NetzMereologyLecture/Conference
10:40
Data structureOverlay-NetzBitComputer programmingHash function
11:25
BitMappingArithmetic progressionMathematical optimizationPhysical systemSubsetComputer fileOpen setLatent heatOverlay-NetzConfiguration spaceParameter (computer programming)SoftwareTerm (mathematics)Software testingCollisionFlow separationAttribute grammarDefault (computer science)BuildingImplementationProjective planeFunction (mathematics)ExpressionSource codeLecture/Conference
14:45
Software testingMixed realityCASE <Informatik>CausalityMultiplication signBefehlsprozessorCrash (computing)Mobile appLibrary (computing)FlagBitMassAbsolute valueDifferent (Kate Ryan album)IntegerComputing platformStandard deviationVirtual machineDefault (computer science)Mathematical optimizationMathematicsInternet service providerOpen setLatent heatCompilation albumConfiguration spaceTunisComputer configurationImplementationGoodness of fitCoroutineSoftware bugSoftwareNumberReal numberInstallation artMessage passingMikroarchitekturRun time (program lifecycle phase)Open sourcePersonal identification numberLinear algebraParallel portThread (computing)ExpressionSuite (music)Computer programmingForm (programming)Numeral (linguistics)CompilerArithmetic meanComputer fileGastropod shellResultantRevision controlMereologyOverlay-NetzEmailWorkloadData managementWrapper (data mining)Scripting languageComputer animationLecture/Conference
23:00
Directory serviceInstallation artSmoothingScripting languageProduct (business)Overlay-NetzComputer programmingFile formatLibrary (computing)Stability theorySoftware developerRevision controlSystem administratorBitRootSupercomputerComputer scienceTotal S.A.Interactive televisionPatch (Unix)BuildingCompilation albumTraffic reportingSoftware testingAddress spaceBenchmarkHash functionIntegrated development environmentMereologyLecture/Conference
26:50
Compilation albumMultiplication signRight angleUniverse (mathematics)Point (geometry)MereologyParallel portRevision controlRootPhysical systemMathematical singularityNamespaceFlow separationComputerSystem administratorGene clusterScaling (geometry)Natural languageLecture/ConferenceMeeting/Interview
Transcript: English(auto-generated)
00:00
Thank you very much. It's a pleasure to be here, first Nixcon. Thank you for your organizing committee that they gave me this opportunity to present our work. Philip didn't make it to the conference, but I think he might actually be online. So this is maybe not
00:22
your average Nix talk title, it's Nix in a scientific environment, bringing Nix together with computational chemistry. So many people are afraid of chemistry, but fear not, I don't have too many chemistry slides actually. So let's start a little bit with the acknowledgements.
00:44
I'm basically a professor at Stockholm University, leading a research group in chemical physics. So I'm actually not really, my T is not my main job, and Nix is more or less a very useful hobby to make our work possible. And what you see here is my research group in
01:02
Stockholm. They actually do not really, they're not active Nixers, but they actually benefit all from a very well set up environment that just runs everything smoothly on Nix.
01:21
Then of course, special thanks also goes to Philip Sabersheepfors, who contributed a lot over the last two years to the effort of making computational chemistry more palatable in Nix. And also special thanks to Christian Kugler, who's in the audience, who actually got me into Nix. And I think without him, I would maybe be sitting here trying to
01:45
package apps with containers. So I got saved. I'll show you a little bit into a background of just to give you an idea of from which side I'm coming from. Then a little bit, I'll talk a little bit about the scientific software infrastructure that is already present
02:04
in Nix packages and things that we add with our overlay and try to modify. And then I also show you some kind of like anecdotes of things that we have come across over the years. Lessons learned from packaging things and all the quirks that can appear. I hope there's
02:23
something useful then for some of you. So what we usually do everyday life, right? We are kind of chemical physics, theoretical chemists. So we mostly do theory and simulation. So no experiments, but we basically keep computers busy all day long at a large rate.
02:43
And what I'm particularly doing is like we're doing electronic structure theory, photochemical processes. For example, you know, what happens if you get a sunburn? The chemical process behind those kinds of things. We look at manipulation of chemical properties with light fields and so on, all those kinds of fancy quantum things.
03:03
We do ultrafast spectroscopy and other psych and physics. So this might not mean much to you if you're not a physicist, but what does it actually come down to when you run your calculations? What's the thing that you do in practice that keeps the computer 99 or 90% of the time busy? And in practice, that means you solve some kind
03:25
of differential equation in quantum mechanics. And in practice, I would say that means two things. So either you stuff things in matrices and you do matrix operations, linear algebra operations, or you do some kind of Fourier transforms. So that's what the main computation
03:44
time in our programs, I would say. The thing relies a lot on parallel computing, of course. So you need to take parallelization into account, large scale parallelization. And to give you a hint a little bit, I mean, I would say in my personal work,
04:00
we have jobs that are like on the order of maybe 50,000 CPU hours, where a single job could run that long, for example. Some of my colleagues, of course, might laugh at that and say, okay, this is small, but well, you already need a little bit of electricity to get that done, actually.
04:23
So from the background, what are the problems that we're actually facing? So when I started my position four years ago, and I said, okay, I have to run my own system. And I have to set something up that's viable and easily maintainable. So what are you actually up against when you look at supercomputing centers? Or in general, at the landscape?
04:45
So the first thing is like a lot of scientific software package are very special. So they're not necessarily packaged. So that's why you have to do the job yourself if you really want to use it. And those things can be quite quirky, but I have a few examples on that later.
05:00
Then, of course, when every time you talk about high performance computing, that means you have to optimize your code for the machine that you're running on, right? You run a modern CPU that have a lot of features that needs to be taken into account. Otherwise, this is basically like running a Formula One car in the first gear over the
05:24
vanilla. Then very common in typical high performance compute centers from university is still so-called environment modules. If you have never heard of them, consider yourself lucky. It's basically a very fragile way of generate custom compiled software environments.
05:48
And shockingly, this is very often done manually still. I mean, the community is in the HPC community is moving forward a little bit. It's getting better, but still a lot of things actually done manually. So every time you update your app, most of your applications probably break.
06:05
And yeah, well, how do you run your jobs? You usually have workload managers installed in your cluster like Sloane, PBS, some kind of batch queuing system to send a simple batch script to the workload manager and the workload manager takes care
06:20
that it actually gets executed somewhere on the cluster. So obvious solution to everyone here in the room, right? Let's use next to do all of that. I mean, out of courtesy, I think I should still be fair and say that's also with Wix. There's a large effort of going on and where people coming into the scientific. So you need to be able to reproduce things
06:52
without a state, right? If you're an experimentalist, you do an experiment, you have to describe it well enough that someone else can repeat it. We are theorists. So we're doing, we mostly run simulations. So it's basically
07:03
sitting in front of a computer all day long. And reproducibility means like, I stick some input in, copy text files, binary data. I send it through some kind of program or script or whatever, and out comes the result. And that result should be every time, should be the same every time you do it. And someone else should also be able to
07:24
reproduce your stuff. So, of course, for the input, I guess you're responsible yourself. But when it comes to the programs, right, you need to rebuild your programs in a reproducible manner that you have the same dependency chain with exactly the same versions
07:42
of the time. Standard problem, I would say. So exactly the same. And you can even ask yourself if you need the same hardware to be 100% reproducible in the end.
08:05
So obvious answer here is that Nix, of course, is perfect to solve the software part of this equation very easily. I would consider this almost standard now. So let's have a little bit of what's actually already available in Nix packages and just going over like with find
08:23
over the Nix packages source tree. Surprisingly, biology is very active here in packaging stuff for their domain. Chemistry, a little bit less, but we're working on that. Math, 170 packages, physics, astronomy, you see also some kind of activity here.
08:42
And of course, what I didn't count is machine learning and data science and those packages are spread over much more categories in Nix packages. So if you talk about then what kind of basic infrastructure in terms of packages does Nix packages provide us? From my perspective,
09:00
something that you always need is some kind of MPI library. MPI stands for message passing interface, if you've never heard of it. The parallelization suite, which is very common in the scientific community. Blast, LARPAC for linear algebra routines. That's pretty much entangled anywhere. And if you look in the Nix store, you probably have a blast implementation even on
09:25
desktop almost. Same fast Fourier transforms, FFTW library. If you start overriding that, you cannot even reinstall Firefox without recompiling it. And of course, we have like the more modern stuff like GPU computing in terms of CUDA and so on, which is already
09:42
nicely there. So then what we've created a few years ago is the so-called NixOS Q-Chem overlay to bring in more quantum chemistry packages. Very specialized domain, of course. The name might be a bit misleading. Of course, it's not NixOS specific, but
10:00
when I started developing it solely on NixOS and the name kind of stuck. So we use this as an incubator for new packages. And well, some packages maybe will have a forever home that it will never get upstream for Nix packages since they are so quirky that I'm actually embarrassed to open a PR for those. Well, what we also built into is like
10:24
optimized for modern CPU so that we actually get out the best acceleration that we can potentially get out of a modern system and provide basically everything that someone who is a computational chemist actually needs to productively work.
10:42
This is very closely coupled to Nix packages, of course. So it's not like we only take stuff from Nix packages. A lot of stuff is flowing back to Nix packages once we have once those things actually have actually been worked out. So this approach with the overlay, we have published it even in a scientific journal, which is very nice because now we
11:03
can actually go ahead and say when we did a calculation, we can say we use this and this and this program and basically NixOS Q-Chem with that hash version, that's how we did it. And then everyone else can basically just take it and reproduce it.
11:21
Yeah, so a little bit structure of the overlay. I think there might be some tricks, tips and tricks in there which are interesting in a more general sense. So what we did, we didn't do that initially and it causes a lot of problem. Actually, all the packages that get introduced by the overlay, they do not just get in default
11:40
attribute set, but they actually get into a subset so that you have a nice separation and avoid name collisions. Just imagine you have a package called Mesa, what could possibly go wrong, right, in terms of name collisions? Nothing works anymore in your system. Then we have introduced some customization so that
12:01
you can easily turn on those kind of CPU-related optimizations and so on to make it a bit more palatable. And what we also do is we take packages from upstream Nix packages and project them into the subset and then apply our optimizations for it.
12:22
Also includes tests, tests are actually super valuable as you all know, but even here helps a lot to detect problems early when you update stuff. So how does that look like? What we do is like Nix packages have this config parameter and we basically just inject our config here in QCam minus config as another subset.
12:45
And you can then, for example, turn on simple tools like use CUDA, which turns on CUDA and all the packages that support it. Or we have a nice neat trick where you can supply a specific internal source URL, which then requires, overrides your required file
13:00
so that you don't have to load all the proprietary packages manually to your next door. And then for example, here, the opt-arc where you can pick something that is already defined here and systems architectures, for example, for AMD CPUs, you say send one, send two or something like that to get optimized building puts. Plus some other convenience things that
13:27
make it sometimes easier to handle, like injecting license files for proprietary software. So then when we talk about customization, let's actually walk back a step and see what's already
13:41
Nix packages. So what we have in Nix packages is MPI and MPI comes in three different flavors. This is open MPI, that's the default, most commonly used these days. But you also have something like MPI CH and MV APH that you could in principle use as a implementation.
14:01
Those MPI APs are actually very well standardized, so they should be very well and easily replaceable. So what we did is like the standard, the default open MPI maps to MPI, which then means like if everyone that writes an expression that uses MPI, you just use the MPI
14:25
attribute to consume that. And then you can simply write an overlay and replace it, for example, with another implementation. And that's an also sometimes useful because here you have different optimizations, maybe for something like that.
14:42
Also interesting because with a setup like that, you can very easily, for example, run the Hydra, which we do, and then build all the different variants and see how well that actually builds. Spoiler alert, not everything builds with everything in the end. That's, I think, the short message here. Then we can do the same for BLAST and LAPACK.
15:03
That's also a feature of mixed packages already. And I think credits here go to Matthew Bauer, who put that system in. It's actually quite sophisticated how that gets replaced here. Those are your linear algebra routines. And in most packages, they're really like
15:20
absolutely crucial to get good performance out of it. So again, you have different implementation like OpenBLAST, the open source default here. You have Intel's MKL, Bliss and Flame from AMD, and so on. And it can, in principle, replace that. Again, works then here super easily with an overlay where you just, for example,
15:42
replace the provider here with MKL. We also built that in different variants on our internal Hydra. And it turns out that's actually a much more complicated problem and a lot of things actually fail when you start replacing it. So it's an interesting case study, I would say.
16:00
Then the CPU optimizations. This is something that we actually have played around with in the overlay and different ways of how you can potentially do that. And of course, the easiest way is to, in your standard ends, to basically start replacing compiler flags to get optimizations. However, if you replace your standard ends,
16:24
you get an absolute mass rebuild. And if you just want to do that for your scientific software, it might actually be a bit too expensive if you want to really do that. So we opted for the intermediate version where we actually only rebuilt the packages in the overlay with an overwritten standard end. And then we can, for example, change host
16:44
platform and add flags like AVX and so on to the standard ends and use also some GCC-related tuning options to speed things up a little. And this is actually a nice feature, again, already nicely laid out in mixed packages itself, that you have here the host platforms
17:06
and all the flags that are potentially supported. For example, if you have a package that have a configure flag for, let's say, here with AVX in that case, then we can simply add an optional
17:20
and choose the default here nicely from standard end host platform and you pick the flags that you want. Nice advantage of that is if you write it in that way, if you overwrite your standard ends with the respective flags here to support the flags in your CPU architecture, your things get automatically built in an optimized way.
17:43
And that has turned out in practice, actually, it works quite well, causes little problems so far in practice and makes things actually really much faster. So the last part of this part of the talk, just a little example how we use that in practice
18:03
here. With the workload manager, again, you just send away a shell script and the workload manager takes care of it. So what we use heavily here is basically the shebang header feature of the shell. So that's actually then also on the wishlist for the new command line interface that we would basically get a replacement for that feature. And then you can either use the
18:25
sloppy version and just define, okay, you want a particular program here, or instead of using minus P, you could also say, I point to a particular mixed shell file with pin dependencies or so, if you want to run it absolutely clean. So four years of experience and lessons learned
18:43
from packaging stuff. Reproducible tests and results. A lot of numerical packages come with some form of test suite. So then you would say, okay, if I give a specific expression with pin dependencies to someone else, it should run exactly the same way, right?
19:03
Yeah, no, we had problems where there was actually not the case. So there are certain impurities, for example, that especially open blast and MKL libraries do that. They have a feature called dynamic CPU detection. So they detect at runtime what platform they're running on
19:20
and what optimizations they want to turn on. Nice feature makes it super fast, very easy. But, you know, sometimes you use different kind of optimization when there's a bug in the machine. You run it on a different machine, test fails on one machine, test runs on another machine. Absolute nightmare. So if you want to really get rid of that, you basically would have
19:43
to turn off the default and compile open blast, for example, for a specific CPU architecture. And hint, for everyone who didn't get the message, never ever use GCC's fast math optimization. That actually can lead to the fact that you don't even get reproducible results with
20:02
the same machine on the same day. Only use that if we really don't care about precision or anything like that. So test fruits and resource usage. You know, a lot of programs are run with, they are compiled with OpenMP, which gives you nice, easy thread parallelization.
20:24
But what does OpenMP do when you don't tell it how many threads to use? Well, it just uses all of your cores, right? So now you have a built machine with 64 cores, they're all blown up. The disadvantage is actually that the test cases are usually very small,
20:41
and they are sometimes actually much faster if you just run them on two threads than in 64. So this can cause massive slowdowns, and I think I have actually seen that in the wild, for example, with a package like NumPy. To get rid of that, simply set omp-num-threads variable to a small number in your install check face or wherever it applies to get around
21:04
that problem. Same thing a little bit for MPI, that's even a bit more difficult, since there's no real auto-detection here. Whenever you have something that actually runs, run it with a fixed number of CPUs, makes the test much more reliable, built faster, and yeah, in general,
21:21
more reliable. So another interesting thing that we observed are Fortran-specific features. And I don't know, this might not, not everyone might be familiar with that, whether it's Quirk or Fortran, but Fortran has a default integer size, which is not clearly defined
21:41
on a 64-bit platform. So it's simply defined by a compiler switch. You tell it, I want my default integer size to be 4 byte or 8 bytes. The problem is that your compiler has no means of actually detecting what a dependency was compiled with, which leads to the fact that you can have a mismatch between your app
22:04
and your library. It compiles just fine and then simply crashes at one time. So also very important message, especially in those cases with BLAST and LARPAC. Test things, run tests, see if your binary actually runs and halfway does what it's supposed to do.
22:22
With this kind of nice setup in mixed packages for BLAST and LARPAC, we have this like, ILP64 flag now basically in the wrappers in there, which is a super nice feature, because then you can simply add some asserts in your expressions and make sure it's actually
22:42
set to what you want it to set. And then it's actually that the dependencies have the right, the right integer size. The fact that, you know, some scientific software packages,
23:01
they maybe have their roots in the 80s. Scientists are not computer scientists usually, but physicists or chemists. So things can end up a little bit quirky. And then not always built in the most standard way. So a lot of things comes from broken CMake, CMake not used in a way that was supposed to be used. Total nightmare interactive installers, which
23:25
basically ask you in which part is your library, which compiler flags do you need to get the library built and so on and so forth. The next thing we had like was very well intended probably by the developers. You have a program that comes with a shell script. It builds
23:46
installs itself automatically into a home directory. Or of course, pinning dependencies the non next way you start downloading things or checking out
24:05
get reports during build time. It doesn't play well with the next sandbox, of course. I think Wild West of build systems, the only advice here from that is like be creative patching that kind of stuff out of your build systems.
24:22
There's probably no other way if you actually need those quirky packages to work. Next package is pinning. I mean, of course, we support flakes. But we think we even went a little bit further and said we always pin overlay to a certain next packages version
24:41
because otherwise things break way, way, way too easily. And we have a little bit of a hack here that we actually make use of flake lock. Even if you're not using flakes might be a dirty hack, but works quite well in practice actually to keep things stable. At least as long as the format of flake lock doesn't change. Yeah. Okay. That's already at the end here. Like summaries,
25:07
I have to say last four years has served me well. We use it in production in Stockholm and in Jena, we have our own small cluster running with it. Productive use every day.
25:20
Updates are actually pretty much always easy. You can keep old versions. Things are super smooth. Little breakage, little headaches. Yeah. Now we actually have really from a scientific perspective, we can share certain environments very easily by pinning simply to some get hashes and sharing that one from else, which makes it super sizable also,
25:45
which is important for us. Some things may be difficult to package, but not impossible. I think there are only few things that we actually gave up on. And I think for our community, this is almost a complete toolkit by now. And yeah, of course, contributions even to the
26:04
overlay that they're always welcome. So outlook, what's important in the future, what we didn't address so far. Benchmarking. How long did a test run on Hydra?
26:22
Much more complex than that. So there needs a lot to be done there. Then what's not integrated whatsoever are window-based compilers like the Intel compiler, AMD compiler. A lot of people depend on that. And of course, the outreach part here again, if you want to use it on super computing centers where you don't have root access,
26:41
you have to kind of convince administrator there to install next four years. Yeah. And that's, I'm at the end. Thank you for your attention.
27:01
So we have time for two questions and I already see hands. Oh, wow. Yeah. I'm going to go to the one I saw first. Hi. So are you able to build your packages with any other compilers and GCC? Yeah. In principle, it should be easy to overwrite your compiler, like for example, with
27:24
LLVM or something like that. Right. But I haven't tried that. So we have, we have experimented with different versions of the libraries, but not the different compiler versions.
27:50
I really appreciate you bringing this topic and I can barely contain my excitement. So the short version of my question is we should talk afterwards. But for the benefit of the audience,
28:00
first, the comment, if you have user namespaces enabled and like shared storage or something, then you don't need a root access to have Nix because you can just use it like that. I don't know if that works for you. That is true. I haven't tried that since if you have everything running on Nix, so you're so comfortable that you don't even want to go somewhere else anymore. But it depends on, I guess, what the setup on the computing system
28:25
is because some setups are actually like very conservative, but very old kind of versions. So maybe that's worth something looking into. And my actual question is a medium sized legacy
28:43
university cluster at some point and the team members don't know Nix. I know Nix and we aren't looking to use Nix on the system because of course users need familiarity. The administrators need familiarity, but I'm constantly keeping it in mind to see if I can push this into the system at some point, maybe in parallel with the usual tooling.
29:03
And the question is, are you familiar with, have you looked into like existing tooling for HPC systems like Singularity, SPAC, the American HPC clusters seem to have a lot of infrastructure that looks pretty good for packaging things that of course isn't Nix.
29:21
But like the important part here is you really need user buy-in. So we're going to be doing this mainly for medical and genetics applications. Do you have any insight on this? Well, I haven't, I think I haven't used personally myself, so I have no experience with that. The Singularity container, they did what several notes or something like that
29:47
things actually need a lot of hacking to get it even working at all. So Singularity as hyped as it is sometimes, I wouldn't recommend it on a larger scale. Okay. Thank you.