Deploying Containerized Applications on Secure Large Scale HPC Production Systems.
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 637 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/53662 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSDEM 2021589 / 637
1
2
7
8
10
11
12
17
29
33
35
38
40
44
48
50
54
59
63
65
85
87
91
95
97
105
108
114
115
119
120
122
126
127
129
130
133
137
140
142
143
147
149
151
156
159
160
161
168
169
170
175
176
177
178
179
182
183
184
187
189
191
193
197
198
204
206
209
212
220
222
224
227
230
233
235
238
242
243
245
247
252
253
255
258
260
261
262
263
264
265
272
273
278
281
282
285
286
287
288
289
294
295
296
302
304
305
308
310
316
320
323
324
328
330
332
335
338
342
343
347
348
349
350
351
360
361
365
368
370
372
374
377
378
380
381
382
383
386
390
392
395
398
402
405
407
408
409
414
419
420
422
425
427
430
439
451
452
453
458
460
461
464
468
470
471
472
473
475
478
485
486
487
491
492
493
495
496
498
509
510
511
512
516
532
534
538
543
548
550
551
554
556
557
559
563
568
570
572
574
575
577
583
585
588
591
593
595
597
601
602
603
604
605
606
607
610
611
617
627
633
634
00:00
SupercomputerSystem programmingLaptopInformation securitySystem programmingIntegrated development environmentSupercomputerCartesian coordinate systemSoftware developerQuicksortSheaf (mathematics)Maxima and minimaOpen setGroup actionProjective planeCore dumpBoom (sailing)Operator (mathematics)Constraint (mathematics)ComputerOffice suiteLocal ringBasis <Mathematik>Computer animation
01:44
Information securitySoftwareSystem programmingEnterprise architectureConfiguration spaceComputer hardwareRead-only memoryVertex (graph theory)Sanitary sewerInternetworkingInstallable File SystemProcess (computing)SupercomputerStability theoryCompilerSystem programmingSoftwarePatch (Unix)Binary codeResultantTerm (mathematics)Mathematical optimizationQuicksortInternetworkingVirtual machineLibrary (computing)MathematicsUniform resource locatorLaptopParallel computingProcess (computing)Latent heatStability theoryArmBefehlsprozessorCartesian coordinate systemMultiplicationComputer hardwareDifferent (Kate Ryan album)Device driverComputer fileConnected spacePattern languagePower (physics)Graphics processing unitIntegrated development environmentFile systemSupercomputerRevision controlFamilySystem softwareFigurate numberPhysical lawFatou-MengeForcing (mathematics)Personal digital assistantInterface (computing)Compilation albumOnline helpBuildingNumberComputer animation
05:11
SupercomputerSoftwareStack (abstract data type)Component-based software engineeringOpen sourceOpen sourceStatement (computer science)SoftwarePresentation of a groupLink (knot theory)Mathematical singularityComputer animation
05:41
Scripting languageComputer-generated imagerySupercomputerSystem programmingData conversionModule (mathematics)Sturm's theoremSupercomputerProcess (computing)System programmingModulare ProgrammierungSoftwareSoftware testingNumberQuicksortMedical imagingMultiplication signAxiom of choiceModule (mathematics)DataflowModel theoryTask (computing)Form (programming)MeasurementWave packetComputer animation
06:55
SupercomputerSoftware testingSystem programmingLocal ringData conversionMechanism designArchitectureSpacetimeSoftwareRun time (program lifecycle phase)Revision controlIntelTensorDataflowPoint cloudStability theoryTask (computing)Vertex (graph theory)Parameter (computer programming)Parallel computingMiniDiscCrash (computing)Pattern languageDirectory serviceSanitary sewerFatou-MengeDefault (computer science)Module (mathematics)Integrated development environmentOpen setSingle-precision floating-point formatFLOPSCoprocessorMultiplicationMatrix (mathematics)AlgorithmResultantKeyboard shortcutSoftwareSystem programmingDirectory serviceProcess (computing)Multiplication signLatent heatData conversionQuicksortGroup actionMathematical singularityPoint cloudComputer fileFile systemAxiom of choiceMedical imagingEntire functionSoftware testingLocal ringCartesian coordinate systemSet (mathematics)RankingArmProjective planeSoftware developerLevel (video gaming)Data centerStability theoryVideo gameModulare ProgrammierungRootModel theoryQuantum computerParallel computingMiniDiscTable (information)CollisionRun time (program lifecycle phase)Centralizer and normalizerBit rateRoundness (object)Profil (magazine)Rule of inferenceContingency tableMachine visionQuantumWordWeb pageSpeciesPhysical lawFood energyLibrary (computing)Fatou-MengeMeasurementPurchasingOperator (mathematics)Uniform resource locatorGoodness of fitOpen sourceDirection (geometry)Standard deviationSurgeryExecution unitForcing (mathematics)Computer animation
13:14
Fatou-MengeSoftwareSystem programmingSupercomputerDefault (computer science)Point cloudDirectory serviceModule (mathematics)Integrated development environmentOpen sourceDirectory serviceSystem programmingSoftwareIntegrated development environmentMathematical singularityPoint cloudLink (knot theory)SupercomputerDefault (computer science)Different (Kate Ryan album)Set (mathematics)Module (mathematics)Keyboard shortcutMereologyLiquidCartesian coordinate systemFatou-MengeOpen setRevision controlComputer fileSoftware testingInstance (computer science)ArmTerm (mathematics)Enterprise architectureQuicksortSelf-organizationMoment (mathematics)Multiplication signBitMedical imagingReal numberElectronic mailing listEmailCloud computingInstallation artParameter (computer programming)Design by contractSurgerySpeciesMathematicsInheritance (object-oriented programming)Product (business)Direction (geometry)Computer configurationBridging (networking)Food energyWeb pageOnline helpRule of inferenceDataflowBoss CorporationComputer animation
19:32
Element (mathematics)Computer animation
Transcript: English(auto-generated)
00:05
My name is David Brayford and I'm a senior scientist at the Leibniz Supercomputing Center in Munich. And I'm also a member of the technical steering committee for the open HPC project. And today I'm talking about how to deploy
00:20
containerized applications and workflows on large scale and secure HPC systems. So basically what we're looking to do is to transition your workflows and your applications from your development environments, which for lots of applications will be laptops and desktops,
00:40
to a supercomputer with as minimal effort as possible. And it just needs to work. Obviously this is gonna become more and more important when we're looking at things like AI and other sort of new sort of non-traditional HPC applications. So the ability to actually create your workflows to be portable inside a container
01:01
is really, really important. It's not necessarily just building your application and compiling your application, but actually you can try workflow because generally that's what's important to you. Not just building your application and running it because your workflows are getting significantly more complicated. So the ability to actually transition workflows from your development environment to supercomputers,
01:23
because obviously a lot of scientists don't just wanna run it on a single supercomputer, they wanna run it on different sensors. And they also wanna be able to do this like with as little effort as possible and also to make it as portable as possible. Unfortunately, containers themselves are not portable
01:42
and we'll explain this in the next section. So obviously some of the key challenges you're actually gonna see when you're running for HPC systems or you can have different instruction set architectures, which also known as ISA. So you've got the CPU. So you have the X86 family, then you have the ARM family, then you have the power family.
02:01
Then you have various different GPUs from Nvidia and AMD and also Intel in the future. And also you can have accelerators on sort of novel and systems. Also, you can have different memory, interconnects and nodes configurations. Those are also gonna be important to you because if you change them,
02:20
you might actually have to change your workflows and what's actually installed in terms of drivers. Also, one of the big concerns which we have at LRZ is that we don't have a direct connection to the internet. So building stuff on our HPC system is not possible if you're connecting to the internet. So if you're using stuff like Python, Julia, or you're pulling down applications
02:42
or software from the internet, that's not actually possible. And finally, things like system software, file systems, obviously driver hardware for GPUs, distributed processing software, such as MPI. Those also need to be specific for your particular HPC system.
03:01
So your container will have to be modified to just that one. And also IO patterns because the parallel IO file systems have specific IO patterns, which they run really well. And when you sort of move away from them, they can actually cause significant problems in the system. So key challenges for the actual user
03:20
is if you're looking at sort of the AI, ML, DL, workflows like such as TensorFlow, you've got a rapid update cycle. So basically you're constantly building stuff. So if you can do it in a container, then this actually helps. Also dependencies, are you running Python 3, Python 2, different versions of glibc,
03:41
different versions of compilers, different versions of libraries, et cetera. Reproducibility, so this is what people talk about when they say containers are reproducible. Actually, they're not because the HPC, underlying HPC system changes. So your host system changes. So you might get changed to your parallel file system,
04:01
such as GPFS. You might get a change to the OS in terms of adding patches. So the actual, what you're actually running is not even though the binaries inside the containers are the same, the actual binaries which will interface on the system or not. And that actually can actually mean that it's no longer reproducible in terms of your results.
04:21
Also, we're looking at sort of performance and stability. Basically, you need to use optimized libraries and software which have been set up for that particular HPC system, especially when you're looking at things like MPI, because if you don't, then you can run it on a small number of nodes,
04:41
but as soon as you scale to the hundreds of nodes and the thousands of MPI ranks, you're gonna have a problem. And finally, which people don't realize is the actual, the container size. If you're doing something like machine learning, you might have terabytes of data. Obviously, having a container of multiple terabytes of data, you're not gonna be able to create that on your laptop.
05:00
So you basically want to store your data in a different location in your container because you need to actually move that container from your development environments to the HPC center. So the next thing I wanna talk about is OpenHPC. It's basically a open source HPC software stack,
05:20
and it's under the Linux Foundation. There's the mission statements. And we also, the OpenHPC supports the container technologies such as Singularity and Charlie Cloud, which are used there. And we actually had a tutorial at SC20, and I'll actually provide links at the end of the presentation for that.
05:42
So this is basically a simple workflow which we use to create containers at LRZ. So basically we create a Docker image, we modify it to make it HPC specific, we copy the instruction and execution scripts.
06:00
So we'll say sort of the first one, the actual number one, is we actually try to do that inside a Docker file, no, rather than actually build a container. And the reason we do that is that you've now actually got the recipe there in front of you. So you don't have to go back and say, what did I do six months ago or 12 months ago to create that container? Then what you want to do is after you've actually created your Docker image,
06:22
test it to make sure that it works, then convert your Docker image to the HPC container image of your choice. Again, test it to verify it works, then copy it to the HPC system. If it has a module system and it has the container technologies inside the module system,
06:41
you load the module and then you execute it via Slurm. And for the system admins on the HPC system, a lot of times it will just look exactly like a traditional MPI job if you're running MPI software or MPI job. So this is basically the takeaways which we've sort of found out in LLZ to work really well.
07:01
A lot of pain, we actually decided and come across these things. So do all your conversions software and create a Docker file because generally the HPC specific containers are not portable. I mean, you cannot translate from a Singularity to your Charlie to Cloud or vice versa
07:20
or to Shifter or to any other HPC container technology. So they all support a transition from Docker. So if you start off with Docker, then you can move to the HPC container of your choice or the container which the HPC system allows because obviously Ellard said, we do not allow Singularity for policy reasons.
07:42
So when people come and say, we have a Singularity container, we ask them, can you create a Charlie Cloud container? And it's a pain if they've just created a Singularity container and not from Docker because it's a lot more work for them to do it. Obviously you want to basically have everything
08:00
in the Docker file so you can actually change it for different architectures. Obviously, if you've written a container and you've got a binary for an x86, it's not gonna run on an ARM system. But the Docker file, you can modify it and then rebuild it with the specific ARM or NVIDIA libraries. Also test the container workflows at each stage
08:23
if you built your container. So when you've built your Docker image and then also your specific HPC container because it saves you a lot of work than moving it to the HPC center and then moving it. Do as much work on your local system or developments VM as possible because you can be roots on your own system
08:42
but you're not gonna be on the HPC system. And it just makes life a lot easier for you. So these are examples of what we've run on LRZ. So if we go to a more background about this is the reason we started to use containers was to get AI and specifically TensorFlow
09:01
to run on the system. And this was an application which is from CERN which is the 3D GAN model for detecting high energy particles from some of the colliders. And as you can see, if you look at the tables you can see we've scaled up to basically a single island on SuperMUC-NG
09:21
which is 768 nodes. And each node contains 48 Xeon Skylake processors. And on the next one, you can see we've got the measured performance and petaflops on the small matrix multiplication algorithm. So we're not in the full TensorFlow
09:42
but actually on the matrix multiplication which we'd expect to be good. And we actually getting good results from that side. If you look at the execution line, so it says CH run minus B and this minus B allows you to bind the directories from the host system inside a equivalent directory
10:01
inside the container. And the reason we do this is to allow the container to actually access and use software from the module system and specifically MPI. And the only reason we got such good performance is we're actually using the system MPI at the runtime in the container
10:20
rather than the standard end pitch. We actually only able to scale up to I think it's 256 nodes and the performance was like four times slower. And the scaling efficiency was bad. And also it wasn't as stable. It was crashing a lot. But when we were using the system MPI which has been tuned for the HPC system
10:40
it worked very, very well. So this is another example. It was from a process project for an EU process project and they wanted to run some containerized applications. So initially they created a container and they had MPI problems. And the reason was that they inside the container they hadn't set the libfabrics parameters.
11:03
So they set that and then it got working but the performance was really poor because they were actually running the MPI over TCP. So to resolve that we asked them to install the OmniPath software and then set it to take advantage of the high-speed interconnects. Because obviously the HPC centers have spent a lot of money on interconnects
11:21
and you wanted to use them as much as possible. The outcome was that all the issues and stability was no longer observed and they were able to run their application on hundreds of nodes on the Skylake system and up to 8,000 MPI ranks. So another example we came across at LZ
11:42
was we had a set of researchers who wanted to use fuzzing testing on the HPC system and they caused all sorts of problems on the system. So basically when they tried to run it they were bringing down the parallel file system because they were creating up to 6 million
12:01
very, very small files in a very, very short amount of time. And it's caused, it basically brought down the entire data centers infrastructure. So, and so initially we said, okay can we switch to a more high performance directory inside the HPC center file system
12:22
by basically using the mount command. So they could write to work instead of running it in home. It also still crashed. So what we looked at and said, found out is how big of files are you creating? They went, oh yeah, they're really, really small and you're creating and destroying all these files. So actually you can actually run them in RAM disk. So we basically mounted RAM disk inside the container
12:44
to store these temporary files. And the outcome was that the application did no longer crash the HPC center. So that was good for everybody. And so that's the sort of things you need to like, be careful of. And when you're actually running these HPC systems
13:02
another example we've been working on is the developments with iCheck in Ireland. They compute HPC central Ireland on a quantum computing software package called Quantex. And what they were using is developing all their software in Julia. One of the initial things we found out
13:22
was that Julia installs their packages in home slash Julia. However, the HPC containerized software we use, which is Charlie Cloud, the default is to map the user's home directory
13:40
into the home directory in the container. So when you go to the home directory inside your container, you see that your home directory on the host. That caused a huge amount of problems because obviously the packages were no longer found and they was causing problems. So we actually went in and changed the environments. And you can also set the environment when you actually start the Charlie Cloud container.
14:01
So when you saw the example CH1 minus B, there's another parameter called set environment and you can set it to a file, which is actually the file which the containing the environment settings which is created when you created Docker. So the default behavior of Charlie crowd is it takes the hosts environment. But we overcame that by using the set env command
14:26
and then set it to the file with the Docker environment. We also wanted to actually profile the application using liquid. And again, we use the mounting option, the minus B for the bind
14:40
to the host system to get the modules. And we're actually able to profile the Julia applications with liquid inside the system. So that's actually been a real use for us. And it's gonna be also useful where if you want to run, say for example, profiling software which is available on the HPC system
15:00
which either has a license associated with which the HPC system has or you don't or it has software which you actually don't want to actually install in your container. So it allows you to actually create a more a small or a minimal container. So you don't need to actually install the software inside your container if it's available on the host system. So that actually reduces the size
15:20
of your actual container. So the next part is we wanna basically talk about how this has been used. So at Supercomputing 2020, we gave a virtual tutorial using the AWS cloud infrastructure. There's a link to the open HPC GitHub
15:41
and we have all the tutorials on there including how to run containers. So if you want to just run the containerized section or if you want to actually set up a open source HPC software stack inside the cloud you can go through the entire tutorial. It will take you about four hours.
16:00
The next link is to Charlie cloud and how to use that. And this is the software we use at LZ because that's the policy. We also know that other HPC centers use singularity which is I admit is the most popular version for the HPC container software. But if your HPC system doesn't allow you to do that
16:22
you need to actually use a different one. So we try to make these workflows as not linked to any particular HPC container. So you want to be able to use different HPC containers because you don't want to actually move to a HPC center and have to go through and change your entire workflow because you've written it for either a singularity container
16:43
or a Charlie cloud container or a shifter container or some other set of containers or set of workflows which require a particular container. So we actually want to keep that as generic as possible. So it gives you, so it's a little bit more work at the start, but it gives you a lot
17:01
it saves you a lot more time when you're actually having to run on different HPC systems. And again, we want to use Docker files because that gives you the recipe. So it's like, oh, what did I, you don't have to remember what you did to install all the stuff to get things running. You can look at the recipe from the Docker file and say, okay, I'm not using open blast.
17:23
I need to use kublast or I need to use the arm instance of these numbers. You can go in and go and change them to run it. And then obviously you need to test it as well before you sort of start running it on the production system. So hopefully the HPC centers have a test VM system or with their architecture to test this stuff.
17:42
There's another example of actually running the open HPC software stack inside a container using Podman. And again, there's examples of this on the GitHub and this is used basically from a Docker image and runs that as well. That's also starting to take more prevalence to the users.
18:08
So some people actually say, okay, we'll install that. Obviously HPC centers are a lot more conservative in terms of what they'll install, but maybe eventually they'll start using that technology at the moment. You're probably not gonna find a HPC center using that.
18:22
The open HPC community, which is an open source software community is open to all people to sort of join to get involved in it. But if you're a organization in academia or in a government lab, it's free to join. Obviously you have to, there's a cost associated if you were the company
18:42
and the contact would be for Neil Kayden, which is the below. It's a good community and the mailing lists are actively used and people are using this software on real HPC systems. So the system at LRZ, the Supermuc-NG system
19:03
is based on open HPC, only a small amount, but there are other systems in the top 50 and I think even in the top 20, which are entirely open HPC software systems. So you'll be all seeing open HPC software and open source software being entirely used
19:20
on real world, large scale HPC systems, which people can use. I guess now I'll just open it up to questions.