HPC Container Conformance
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Untertitel |
| |
Serientitel | ||
Anzahl der Teile | 542 | |
Autor | ||
Lizenz | CC-Namensnennung 2.0 Belgien: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/62019 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
00:00
FlächeninhaltProjektive EbeneComputeranimation
00:30
ErwartungswertDigital Rights ManagementTermNotepad-ComputerGebäude <Mathematik>SpezialrechnerSupercomputerPerspektiveUmwandlungsenthalpieGenerizitätPhysikalisches SystemFokalpunktProgrammierumgebungErwartungswertMereologieElektronische PublikationSiedepunktDigital Rights ManagementKartesische KoordinatenSingularität <Mathematik>MaschinenschreibenComputeranimation
01:46
Mathematische LogikFunktionalSoftwaretestVersionsverwaltungPunktOrdnungsreduktionSoftwareNeuroinformatikMessage-PassingSoftwareKonforme AbbildungComputeranimation
02:15
PunktKartesische KoordinatenComputeranimation
02:33
ProgrammierumgebungLoginNabel <Mathematik>SupercomputerSpezialrechnerPunktNabel <Mathematik>Computeranimation
03:10
PunktProgrammierumgebungSpezialrechnerLoginOpen SourceBenutzerprofilEntscheidungstheorieLaufzeitfehlerKartesische KoordinatenProgrammierumgebungMathematikMereologieVariableProjektive EbeneComputeranimation
03:26
InformationHardwareProgrammierumgebungSpezialrechnerVersionsverwaltungKonfigurationsraumLaufzeitfehlerBefehlsprozessorGraphikprozessorRechenschieberGruppenoperationArchitektur <Informatik>SystemplattformARM <Computerarchitektur>Übersetzer <Informatik>GenerizitätVererbungshierarchieATMTreiber <Programm>Framework <Informatik>ImplementierungTelekommunikationGlobale OptimierungPhysikalisches SystemSkriptspracheKernel <Informatik>Web SitePhysikalisches SystemLaufzeitfehlerBitRichtungEinsDifferenteGruppenoperationKonfigurationsraumGeradeFramework <Informatik>BimodulMereologieImplementierungEinfache GenauigkeitOffene MengeVersionsverwaltungMikroarchitekturSystemverwaltungTreiber <Programm>Kartesische KoordinatenWeb SiteGebäude <Mathematik>ProgrammierumgebungInhalt <Mathematik>SchlüsselverwaltungSoftwaretestSoftwareComputerarchitekturRückkopplungInformationReelle ZahlRechenschieberKonfiguration <Informatik>HardwareHook <Programmierung>Computeranimation
08:34
SpezialrechnerInformationPhysikalisches SystemVersionsverwaltungGebäude <Mathematik>MAPZugriffskontrolleSupercomputerSystemplattformATMVererbungshierarchieService providerDifferenteZweiDateiformatCASE <Informatik>Computeranimation
09:36
IndexberechnungSpezialrechnerKernel <Informatik>VersionsverwaltungSoftwarePhysikalisches SystemElektronischer FingerabdruckHardwareGraphikprozessorLaufzeitfehlerMultiplikationsoperatorPhysikalisches SystemElektronischer FingerabdruckComputeranimation
09:49
Einfache GenauigkeitPhysikalisches SystemSpezialrechnerElektronischer FingerabdruckSystemplattformGlobale OptimierungVersionsverwaltungVererbungshierarchieHardwareService providerGenerizitätATMFlächeninhaltMultiplikationsoperatorHilfesystemSoftwaretestBitComputeranimation
10:16
Coxeter-GruppeComputeranimation
10:28
BenchmarkSuite <Programmpaket>DatenbankComputeranimation
10:38
Suite <Programmpaket>Projektive EbeneComputeranimation
11:20
Flussdiagramm
Transkript: Englisch(automatisch erzeugt)
00:05
Cool. First lightning talk is Christian. Yeah. Thanks, Dennis. Connor said that he has a relaxed talk. I'm not. I have only 10 minutes, so I need to speed up. What I would like to talk today is about the HPC container conformance, which is a project that came out of the HPC container advisory council, which is every first Thursday.
00:24
And we try to provide guidance on how to build and annotate HPC containers. So conformance, what you might ask, so what are we trying to achieve? We focus on two applications, maybe a third, but mainly GROMACS and PyTorch. And we want to go through an exercise of providing best practices on how to build or shape the
00:46
container and also how to annotate the container. And I think that's the most important part is the annotation part, by the way. Anyhow, what we don't want to achieve is we don't want to boil the ocean by making everything work everywhere. So that's why we focus on these two applications. And we want also to allow for generic and also highly optimized images and make with
01:05
annotations, make sure that people can actually discover those and also provide some expectation management for those. We're going to focus on OCI images and most likely on Docker files. I mean, if people throw a lot of singularity build recipes at me, then maybe I will
01:23
change my mind. But first, for starters, we're going with Docker files and OCI images. And if we have a Docker file that is derived from other artifacts, like a spec YAML file or an easy build recipe or an HPCCM recipe, then, of course, we also want to include those to make it easy for people to reproduce and tweak the actual container.
01:44
When going through this research or this project, I was like, I'm in touch with the biocontainer community, and they created a paper in 2019, which is pretty interesting, where they provide some recommendation on how to package and containerize bioinformatics software. Of course, they don't compile for different targets and they don't use MPI a lot.
02:05
So it's just a baseline, I think, for our work in HPC, but it's a good baseline. And I highly recommend this paper to be read by people. So the first thing in the HPC container conformance project is the expected image behavior.
02:20
So I think we have all been there where we have different images. We wanted to swap out and then we realized, oh, the entry point is different or the container does not use an entry point, but the application name. And so we want to make sure that at the end of the day, all the containers that we produce in the HPC world are built in a way that they behave the same way so that you can just swap out the container.
02:41
You want to run GROMACS, you try out multiple different containers and you don't need to change your submit script, but only the name. So at the end of the day, the container should drop you into a shell like you are logging into an SSH node and it should also have a very small, ideally small or even no entry point so that it's easy to debug as well.
03:03
So if the entry point takes forever or makes a lot of changes, then it's hard to debug the container. So the container should be, has a very small or even no entry point and maybe it changes some environment variables to pick up the application that is installed maybe by
03:21
an easy build or spec, but it should be very small. The main part is annotations for this project and why annotations? So the basic idea is, and we have all been there, so everyone who has done HPC containers, that we encode the information about the specific implementation of the
03:40
image in the tag or in the name. And we don't want to do this anymore, right? So we want the information to be annotated to the image and not part of the name because the name might change. So what do we want to do with these annotations? We want two things. First, kind of describe the image, the content of the image and how the image is
04:00
expected to be used so that sysadmins and end users know what to expect. So what user land is provided by the image? What tools are installed on the image? How is the main application compiled? Like for what target? For what microarchitecture of the CPU? For what GPU? Which MPI is used and so on?
04:21
So that we can take this information and make like maybe configuration examples for different container runtimes that hooks can react to those annotations. Like Potman and Seros, for instance, they can already react to annotations. So depending on what the image provides as information, the runtime can adapt
04:41
and say, OK, I have an open MPI container. I do this hook. I have an AMP-based container. I take this hook. So I think that would be great. If we can agree on certain annotations and agreeing on certain annotations, I think it's a huge task, but I'm hopeful that we can achieve this and then make sure that the configuration is done
05:01
so that the application is tweaked the right way. And another piece that we can achieve here is that we create maybe a smoke test that looks at the host that is running on, looks at the annotations of the container that you want to run and just tells you, OK, this thing will segfault anyway. You are on SN2 and you have an application that's compiled for Skylake.
05:23
It won't work so that you don't download 30 gigabytes of images, of layers just to realize that your image won't work. So I think that's also a very important part that we can do this. Another part as well is not just describe the image, but make it easy for end users to discover what images are around.
05:42
So you want to run GROMACS and you know or don't know the system you're on. So maybe you can just run a tool or have a website that tells you you want to run GROMACS. I have looked through all the annotations. I know a little bit about your system. Here we go. This is the image that you want to use. Also for discovery, I think that's important.
06:01
Of course, we will have mandatory and optional annotations. So mandatory ones might be what CPU architecture is it compiled for? I think that's the obvious one. And optional ones, of course, if you want to add a CUDA version because your image has CUDA installed, then of course, that's an optional one. Or you want to annotate the whole software bill of material.
06:24
Maybe it's too much information, but maybe not. So there are optional and mandatory annotations. I think that's pretty clear. OK, and I created a couple of groups like annotation groups that I think we should think about. I won't go through every single line item here
06:41
because I only have 10 minutes and three minutes left. So just maybe grab the slides afterwards and then go through it. And it's not written in stone. It's just a proposal. So yeah, happy to have feedback on this as well. So the first big one, and I talked about it already, is of course hardware annotations. So what is the target optimized for?
07:02
The architecture, generic architecture or the real micro architecture and then the key version value for this. As I said, CUDA versions, driver versions and so on. I think that's obvious that we need to annotate the container so that it defines what the actual execution environment should look like.
07:23
Also obvious HPC things like the MPI and interconnect annotations so that you define what the implementation of the container is. Is it open MPI? Is it image based? Is it even threat MPI because you only want to run single node? What framework is used? libfabrics, UCX, what have you.
07:40
And now I'm going through all the line items, so maybe I should stop. But at the end, I think the last line is also important. What is the container? Two minutes left even. What is the container? Actually, how is it expecting to be tweaked? So is the MPI being replaced, libfabric injected and so on? That's also, I think, important so that the sysadmin or the runtime knows what to do
08:02
with the container to make it work on line speed. System annotations, I think, is also important so that we know what the container expects from the kernel, what modules are introduced and so on. And also what the end user can expect, what tools are installed. Is jq installed? Is wget installed and so on?
08:22
Another annotation is, of course, documentation would be nice as well. Base64 encoded markdown would be great so that you can render how tos and build tweaks and so on directly. OK, one minute. How to annotate? I think that's obvious as well. That's a layered approach. Of course, the base image should have annotations
08:40
that we can carry over. And if you build subsequent images at the annotations that are important and after the image is already built, you can use things like crane or build up or portman, I think, or builder to annotate images at the end without even rebuilding them, just repurposing them. Or we could also collect annotations offline
09:03
in another format and then annotate it. OK, ideally, and that's like Kenneth and, of course, Todd as well, EasyBuilds, Speck, they should annotate it correctly so that we don't need to teach everyone to annotate, but the tools just annotate the image for us.
09:20
And that's the external piece. So I created a tool, MetaHub, where we define images for different use cases and we can also annotate those images without actually changing the image, but just with this. So, OK, 10 seconds. Last one. We need, of course, a fingerprint of the system to match the annotations against the host itself.
09:45
So there needs to be a tool. Time is up. And, yeah, so we need to discover the right image, need to have a smoke test and help tweak the container. That's like the last bits I think.
10:06
Thank you for the excellent example on how to do a lightning talk on time. We'll take one question. Any questions for Christian? Do you need the clicker?
10:22
Thank you for your presentation. I would like to ask, how does this relate to, like, existing software, supply chain, metadata, databases like GraphES? Does this complement their functionality? Is this completely something different? I mean, we are good in HPC to build our own thing and then just say that everyone should adopt it.
10:41
I think we want to complement it, right? We want to use these two applications and go to the exercise and then maybe learn from what we did with this project and try to push these ideas also in other things. But I think the AIML folks, they maybe didn't realize that they won't have this problem, so we try also to not only think about HPC here
11:02
but also think about other communities as well. So I'm open to everyone and the project is as well. Thank you very much, Christian. If you want to chat with Christian, he'll be around, probably outside the door for the rest of the day or in the room. And we'll switch it over to the next.