LUMI supercomputer for spatial data analysis, especially deep learning
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 156 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/68488 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
00:00
Trigonometric functionsData managementSupercomputerComputational scienceNeuroinformatikExecution unitGroup actionGeometryBitLecture/ConferenceMeeting/InterviewComputer animation
00:28
NeuroinformatikState of matter
00:51
SupercomputerGraph coloringVirtual machineComputer animation
01:21
SupercomputerEnterprise architectureVirtual machineMoving averageRow (database)Computer animationMeeting/Interview
01:41
Power (physics)WeightLaptopChannel capacitySupercomputerService (economics)Point cloudQuantumParallel computingPhysical systemData storage deviceObject (grammar)Graphics processing unitPartition (number theory)Read-only memoryVisualization (computer graphics)Extreme programmingBand matrixArchitectureBefehlsprozessorSupercomputerFile systemData analysisPartition (number theory)Flash memoryCloud computingData storage deviceQuantumBefehlsprozessorProjective planeParallel computingMereologyVirtual machineMedical imagingObject (grammar)Core dumpComputer animation
03:19
Data storage deviceSpacetimeVertex (graph theory)BefehlsprozessorRead-only memoryLaptopData modelInferenceData analysisProcess modelingService (economics)SatelliteMathematical optimizationEndliche ModelltheorieLatent heatCASE <Informatik>Graphics processing unitUsabilityCodeFunction (mathematics)StapeldateiScripting languageCalculationData storage deviceSupercomputerBefehlsprozessorGraphics processing unitCASE <Informatik>Semiconductor memorySoftwareEndliche ModelltheorieProjective planeWave packetMultiplication signSatelliteCartesian coordinate systemScripting languageMathematical analysisElectronic mailing list1 (number)Covering spaceComputerQueue (abstract data type)Core dumpMedical imagingWeb serviceVirtual machineCloud computingStatisticsPoint cloudBackupStapeldateiComputer configurationResultantOnline helpDenial-of-service attackWeb 2.0Moment (mathematics)Different (Kate Ryan album)LaptopCodeData analysisFormal languageLocal ringMereologyReal numberDebuggerComputer animation
10:44
SupercomputerTrigonometric functionsForceHydraulic jumpMobile appStapeldateiSoftware testingScripting languageInstallation artSoftwareParallel computingActive contour modelMehrprozessorsystemRule of inferenceCore dumpFreewareBefehlsprozessorData storage deviceGraphics processing unitEvent horizonWindowComputerParallel computingWeb 2.0Bit1 (number)Visualization (computer graphics)WhiteboardTensorPhysical systemBootingStapeldateiCodeDataflowComputer configurationProjective planeLibrary (computing)Goodness of fitFlow separationCASE <Informatik>Functional (mathematics)Multiplication signSoftware testingLoginWeb pageUser interfaceWave packetSupercomputerRow (database)Open sourceInformationLink (knot theory)Queue (abstract data type)Task (computing)Active contour modelCuboidMereologyVirtual machinePolarization (waves)Computer fileProcess (computing)Online helpError messageResultantBuildingRevision controlServer (computing)Arc (geometry)Integrated development environmentElectronic mailing listLatent heatPresentation of a groupInstallation artComputer animation
18:09
Installation artChannel capacitySupercomputerComputer configurationCASE <Informatik>Wave packetProcess (computing)Software maintenanceMultiplication signTerm (mathematics)Key (cryptography)Projective planeActive contour modelSystem administratorSupercomputerMereologyData managementComputerElectronic mailing listService (economics)Email1 (number)Presentation of a groupSource codeControl flowLink (knot theory)Rule of inferenceBuildingUniverse (mathematics)MultiplicationLecture/ConferenceMeeting/Interview
22:50
Special unitary groupComputer-assisted translationLeast squaresComputer animation
Transcript: English(auto-generated)
00:00
Okay, hello everyone and welcome to listen to our talk about LUMI supercomputer. My name is Katria and I work as a manager there at the Center for Scientific Computing in Finland. And in my unit we have this geoinformatics group also. Because we don't only provide infra, we also like to enhance it with geoinformatics tools and data.
00:24
So that it's not like cold infra only. A little bit about CESC. So we are owned by the Finnish state and the higher education institutes. And like I said, we are not like a research institute. We like to provide you with this infra and the geoinformatics tools and data.
00:46
And then we just see what you researchers do. And here you can see LUMI is the white dot up there in the north, Kayane. And then the countries with dark color, they are the LUMI consortium countries.
01:05
And then the lighter gray is the European Union. So it's a joint undertaking. And Kayane was selected because it's in the north and it's cold and it's easy to cool down. The machine.
01:23
This is how it looks like from outside. Maybe not so impressive. But maybe it's just nice to see how it is. And this is the actual LUMI. So it's a huge machine with rows of, I don't know.
01:43
It's the fifth fastest supercomputer globally. It used to be the third fastest, but now I think U.S. has gone further. And this snowflake image I wanted to snow. So LUMI actually means snow in Finnish and also I heard in Estonian.
02:04
So if you start from 11 o'clock, so you have this LUMI C partition of CPU cores. But actually LUMI is mostly a GPU machine. So if you work with machine learning, it's the machine for you. Then we have partition for data analytics.
02:23
Flash-based storage layer. Then we have Lustre parallel file system. And then at six o'clock, the LUMI object storage. So it's where you can store your data during the project lifetime. Then there is also partition for QPUs, which I've heard are used also for, say, research.
02:46
And LUMI K container cloud service. And now I will hand over to Kirly. Yes, hello, I'm Kirly. I'm also from CSC, but I have actually graduated from this house.
03:01
I'm happy to be back. Maybe one small correction, this quantum part of LUMI will be in Czech Republic and not in Kayane. And it will be quantum, so those who know what is quantum, they know it's quite different. So we don't talk about LUMI COO today at all.
03:21
But I think many of you have used some cloud computers, but then this LUMI supercomputer is quite different thing. As Katri showed the pictures, it is one huge machine. And if with virtual machines you get the whole virtual machine for yourself, then with supercomputer you usually ask for some tiny, tiny part of it.
03:43
And all the users are using the same machine, so there is just one of them. And, let's see this one. So, as Katri said, it's mostly a GPU machine with a lot of GPUs.
04:02
So, that is, from GIS, I will come back to the software that you can use, but it's mostly for deep learning cases. But it has also CPU, so you can run the more usual software there also. And one thing with the supercomputers is that there is huge amounts of memory available.
04:23
So, if you need something where you need one terabyte of memory, it's not a mistake, then there it is possible. I think you can never find such virtual machine where that would be possible. And there is also quite a lot of storage. As Katri already said, there is different options, but local storage in Lumi, that is up to 500 terabytes that you can have there locally.
04:49
And when talking about use cases, then because it's so much a GPU machine, then deep learning projects come first. And the bigger the model you are developing, the more you should think about Lumi.
05:01
And mostly, at the moment, there is this big language models that they run there. But there are already a few spatial data projects also ongoing. What we know are from Finland, but we don't know all the statistics, so there might be somebody else also.
05:22
And also, basically, any data analysis, if it takes days or weeks for you, then you could consider moving it to a supercomputer. It depends a lot what kind it is, but in many cases, it could be used. But it is not a web service.
05:41
You cannot put any web service up there. That is one thing that I think in many projects people would like, that there is some nice web front end and something is happening in the background. So that is not possible with Lumi at the moment, at least. And also because the way it works, nothing time critical can be done there because you never know how long it takes that even your work starts.
06:06
So it might be like days in queue before anything starts. So that is why such time critical things don't work. I put here one example case from iSci that has been using Lumi for a while.
06:23
I don't know if there are people who don't know, iSci is a Finnish... It was a startup, but I think it has grown too big nowadays. But anyway, they have a lot of satellites for SAR data globally. And because the real visit time is very short, then the applications often are about floods and wildfires
06:43
and things that happen quite fast. And to detect these things from their images, they use a lot of deep learning models. They have really a lot of people and a lot of models. And they now do their model training in Lumi. The actual use of these models in their case happens elsewhere
07:03
because the model training is the one usually that takes a lot of computing power and that they do now in Lumi. And they have been pretty happy there because earlier they said that when they were using a commercial cloud, it might have happened that the GPUs were not even available because they wanted so many.
07:20
But because Lumi has so many GPUs, so then the availability has been good. And what they liked especially was that the data was close to the GPUs. I think in commercial clouds, they were always starting up a new virtual machine, moving the data to there, and then doing the analysis. So now they can keep the data all the time there
07:41
and then do the model training as needed. And also, it comes back to the costs and stuff. In most cases, it's actually free of charge, but in Finland, for companies, it's cost. But anyway, they said it's much cheaper than the commercial options. And one thing that also differs us from the commercial clouds
08:03
is that Lumi has a special team of support because it was from 11 countries, so there is one person from each of the countries, plus from CSU, I am in this backup support team. So there is another 10 persons maybe who help us with more specific questions.
08:23
So because I think in commercial clouds, they never help you with actually how to run and how to install, and it's always your own worry. So there is people to ask help for. But then I think for many, the supercomputer might be a new thing.
08:41
So how it works is basically you need to have scripts. If you're used to Kugis or Arcus, then it's difficult. But the scripts basically can be almost anything. The most common ones that we have seen on Python are maybe some bash scripts. And we have Kugis there installed,
09:03
but that is mainly, it was a deep learning project that asked to see the results. So you shouldn't use Kugis actually for the analysis, but technically it is possible to use pure Kugis for doing this stuff. But I don't think that many people use that.
09:20
And then in supercomputers, everything works. You have such batch jobs that you have the scripts that you run, and then you have a short extra script that defines how many computing resources you need, cores or GPUs and memory and stuff. And then you put it to the queue, and that was the place where it might start immediately, but might take some time. And also you must, that is often difficult,
09:44
that you must see in advance how long it might take. But you can always put a little bit extra, so that is not so bad. And then one important thing to get actual things done fast in supercomputer is that the code must run in parallel, in a way or another.
10:00
If you run a usual one core Python on R script, it's as fast in supercomputer as in a local laptop. There is no magic. People sometimes think that, oh, it's a supercomputer, it will be super fast. But just moving it there, it doesn't make it really super fast. And then we have some tools already installed to Lumi.
10:24
I think for deep learning projects that is both PyTorch and Keras, I think these cover most of the cases. From GIS tools, there is a few ones installed, but it could be at least a lot longer. We have kind of decided that we install when somebody asks.
10:44
So I put here also the list that we have in another national computer to what only Finns have access. But these tools, we have installed the national computer, and it should be possible also in Lumi. Even the saga that was here in the previous session,
11:02
I think I don't have it on the list, but that is also possible. But basically most, most, most cases use Python and R. All the other things. But also these other tools, they can be used. Sometimes via R, sometimes via Python, or then a command line. So basically anything that can be installed to Linux,
11:22
open source is very good. Licensing always cause trouble. Especially in supercomputers, because it's not one computer, so you must have floating license if you have a commercial tool. And if it's a GPU tool, then in Lumi case, it must be AMD GPU, because that is what it is.
11:40
What is not possible at all is anything Windows only. So ArcGIS, for example, we have in the national computer the ArcGIS boot on API, but that's it. But the ArcGIS boot even is not possible, because the licensing goes in a way that's not possible. And also anything that is a server,
12:04
APIs, databases, these are not possible. And in Lumi case, many projects want to do their own installations, because they might have very specific versions and stuff, what they want to do. So anything that is Conda, or PIP based,
12:22
or Docker based, should be very easy. There is a special tool for that. Or if there is people who have used supercomputers elsewhere, then this obtainers easy build and spec could be familiar. So these are also supported. And there is also such ready-made Python environments, so if you need just a few ones extra, then
12:40
that is also possible to just extend them. Lumi has also a web interface, but mostly working with Lumi means that you are in this Linux box, and my kids call it AlienBox. So it requires a little bit of Linux skills to use a supercomputer,
13:01
but this web interface makes it possible to, some of the tasks make it easy, but yeah. So I would say mostly people use Jupyter in this web interface, or then from that via desktop, this Kugis or CRAS or Sarages.
13:23
Or then also for the, no I lost. Or if in case of deep learning, then this tensor board of course can be of interest also. And then also there is Visual Studio code for actually writing the code there.
13:44
And then how to make the code parallel, because many of the GIS tools don't support it. And that's kind of the biggest problem. If you come from chemistry or some molecular biology, then there's a lot of tools that do it for you, and you don't have to really think about it.
14:02
But in GIS case, mostly you have to think about it. I put here a few Python and R libraries that have at least something in parallel. But even for example in Tera case in R, it's only a few functions that go in parallel.
14:21
Mostly it's not. And then the reality is that you have to do it yourself. That is again, both in Python and R, several libraries that can be used. In R I would recommend to use the future library. In Python there are several good ones. I think the ask is nowadays maybe the first one.
14:43
It's not the easiest, but it has a lot of options. But even if you don't want to go parallelizing your code, then there is also extra tools. For example, if you want to run many files, then you can kind of use external parallelization.
15:01
So there is the GNU parallel. There is also special tools like Snake, make and next flow. I don't believe anybody has ever heard of them. But anyway, there is many options to make your code parallel. And it's not always even so complicated.
15:20
So how goes the usual use case? So the first thing is to get access and use accounts. Kateri comes back to that still. And then you must open this alien box and log in. And moving data, if you know how to move data to a Linux machine, then exactly the same. It's kind of a Linux machine, but just a huge one.
15:44
So basically all this log in and moving data, this for Linux users should be familiar. And then you come to the installation parts. That is good documentation about it. And you can always ask help. This installation is maybe the main thing where you should ask for help, because especially in Lummi,
16:01
there are some few tricks and things that could be done and people who work with Lummi every day, they know them very well. And then the next thing is to write this batch job that was for sending it to the system. But this batch job, it is like 10 rows. So that shouldn't be too difficult.
16:22
If you see a few examples, then you get. And also, Kateri will also come back, but they have courses about that also. And then basically you sit in the queue and you get some results. And very likely first time you get some error, but just keep trying. And then you can see how it went. Usually you start with some test data
16:41
and then you run the actual data. And Lummi has a lot of documentation, so you can see it from there also. Okay. I think we are running out of time, so I can just say that we have tons of these links here
17:01
on how to get started. You can apply through our HPCU or through your own country for resources. And there is 20% reserved for companies. One fun story that I forgot to tell you earlier is that Lummi access heat is used to heat up city of Kayani,
17:23
some of it. And then when they built it, it was of course late because all the big infra projects are always late. So then they had this really hard deadline that if you don't deliver by this date, then the city people in Kayani will have cold showers. So then they, yeah, really.
17:40
But they made it in time. But yeah, we have this bunch of links and then you can contact us and get more information about how to get started. And there is about training and more about Lummi in general. So just contact us and we can send you this presentation.
18:04
And we have the conference page. Yes, so thank you. Thank you very much for the presentation. It's good to see Snakemake there.
18:21
We use it very often. Some questions from the room. Thank you very much. I was quite pleasantly surprised how Lummi is friendly.
18:41
And I saw multiple package managers are supported for installation. Did you consider to support Nix as well? Nix package manager? Like if you don't know what I'm talking, then you didn't.
19:02
Thank you. Let's say that it was not us who decided. It was the international team of people. Usually the supercomputer had either this easy build or spec, but because they couldn't agree, they built both. But yeah, I think these are the four options that there is.
19:23
Like building from source is the fifth, of course. You had the example of iSci, but how widely Lummi is used by GIS industry in general?
19:41
Not yet, but there is opportunity. It's the only use case. I think it's the only commercial use case. There is some academic projects. But it has been also historically in Finland that the companies have not had access to supercomputers because the national ones,
20:01
because we are this ministerial company, then we cannot sell them. But this Lummi is now the first time that companies even have possibility. I don't know. I think also in other countries, it has been quite limited for companies to get access, but now there really is possibility for companies
20:21
and even for public administration. I think it's not written here, but for this Europe HPCJU part, companies, academia, and public administration can apply for. This Europe HPCJU side is always for free. The countries can have their own rules.
20:41
I know that Estonia, for example, is asking for money both from academia and companies. Finland is asking only from companies. We didn't go through all 11 countries how they have their rules. More questions?
21:03
Just to thank you very much. Let me finish my sentence and then I'll give it to you. I have huge respect to the maintainers of the supercomputers. There's this story of one university where a maintainer went away and her job had to be replaced by five others.
21:21
Luis. Let's see if I can synthesize this. We're talking here about distributed computing, which is very different from what most, I would say, researchers are expecting. I see that you have some training options,
21:41
but how is it going? Do you have people attending this training? Is it provided as a service? Because I think that's really the key to get something like LUMI going to the long term. I have forgotten, but we actually will have a course in October.
22:00
It has an option to come to ESBO or you can participate remotely and it's free of charge for everybody. How long? Two days. So people are not allowed there. Take the opportunity.
22:21
So search for CSE and key of computing course. You will find it with Google. I think not any of these links actually take you there. Okay, just use ESBO for coffee break. It's a really good opportunity. If you are stuck with your stuff, then just go there and ask. By the way, feel free to use the mailing list
22:43
of the OSU foundation to announce this kind of training. I think that's fine. Thank you very much for attending. Thank you.