We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Simplifying the creation of Slurm client environments

00:00

Formal Metadata

Title
Simplifying the creation of Slurm client environments
Subtitle
A Straw for your Slurm beverage
Title of Series
Number of Parts
542
Author
Contributors
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Slurm is the most widely used batch scheduler for HPC systems. The Open Source Software community is very active in the development surrounding the Slurm ecosystem, contributing CLI tools for accounting, monitoring, and notebooks among others. A lot of these client environments are nowadays created on containers, which have become a ubiquitous part of running applications. However, this way of working provides new challenges in HPC environments, especially when using Slurm. Slurm requires careful management of shared cluster secrets and cluster-wide configuration files that need to be in sync in order to work efficiently and securely. This talk proposes a novel and simple tool called straw, which allows the creation of secret-less and config-less Slurm client environments. Therefore simplifying the creation of (containerised) environments by removing the burdens of maintaining config files, sensitive munge secrets, and additional daemons. This talk will first provide an introduction to Slurm, followed by a description (mostly drawing from personal experience) of common patterns and pitfalls when creating containers that interact with Slurm clusters for different purposes (monitoring, notebooks, etc). Next, I will introduce Straw, explaining why it was needed and why despite its simplicity (it mostly just fetches a bunch of config files), it is able to perform a task that regular Slurm tools can't, therefore simplifying Slurm client environments. Finally, I will conclude by showing a simple example of how the tool can be used, and how it compares to the usual scenarios in which config files, extra daemons, and secrets need to be carefully managed. If time allows it, I might detail some of the weaknesses of this approach: the fact that the Slurm protocol isn't really documented, and therefore this tool relies on "reverse-engineering" (as much as one can say reverse engineering when no documentation exists, but the code is available) to keep up with new Slurm releases.
Integrated development environmentClient (computing)Computer animation
CASE <Informatik>
Shared memoryClient (computing)DemonGame controllerCore dumpBitScripting languageMereologyOpen sourceDemonSynchronizationData managementWorkloadProcess (computing)Game controllerConfiguration spaceComputer fileScheduling (computing)Arithmetic meanClient (computing)Ideal (ethics)Virtual machineBefehlsprozessorBuildingStapeldateiCommunications protocolComputer animation
Shared memoryDisintegrationGame controllerService (economics)Client (computing)Read-only memoryNetwork socketRange (statistics)Local GroupDirectory serviceAsynchronous Transfer ModeVolumePattern languageGastropod shellDeclarative programmingOpen sourceCASE <Informatik>Open sourceRemote procedure callConfiguration spaceHacker (term)Information securityMereologySingle-precision floating-point formatKey (cryptography)Configuration managementService (economics)Combinational logicWeb 2.0Goodness of fitDemonLaptopGame controllerSinc functionINTEGRALLocal ringClient (computing)Virtual machineAsynchronous Transfer ModeKeyboard shortcutProcess (computing)Network socketQuicksortDifferent (Kate Ryan album)Computer fileGoogolComputer animation
Common Language InfrastructureGame controllerAuthenticationDefault (computer science)AuthorizationServer (computing)Computer configurationRevision controlComputer programCommunications protocolElectronic mailing listPositional notationMessage passingFunction (mathematics)Directory serviceConfiguration spaceComputer fileLocal ringClient (computing)Information securityService (economics)ImplementationConfiguration spaceMultiplication signService (economics)Web 2.0Variable (mathematics)Token ringFlow separationLimit (category theory)Game controllerClient (computing)Open sourceComputer fileInformationLaptopHigh availabilityINTEGRALReverse engineeringMehrplatzsystemAuthenticationRevision controlLibrary (computing)Communications protocolMultiplicationCASE <Informatik>Server (computing)Electronic mailing listComputer animation
Client (computing)Information securityService (economics)Token ringImplementationGame controllerShared memoryDisintegrationVertex (graph theory)Configuration spaceComputer fileServer (computing)Game controllerSlide ruleScheduling (computing)Metric systemValidity (statistics)LaptopClient (computing)QuicksortMereologyPlug-in (computing)Web 2.0Revision controlDefault (computer science)Front and back endsDebuggerToken ringCartesian coordinate systemAuthenticationCuboidCASE <Informatik>Multiplication signPatch (Unix)Computer filePosition operatorComputer animation
Program flowchart
Transcript: English(auto-generated)
OK, next talk is Pablo, who is going to explain to us how to set up Slurm client environments
more easily.
So just a brief introduction to Slurm in case anybody is not familiar with it.
Slurm is basically both a resource manager and a job scheduler. Meaning Slurm will manage your allocations, it will track which machines are in use and to which jobs and which users own which CPUs and which nodes, etc.
And it's also the job scheduler, meaning it will, when users submit jobs, you have your happy users over there, or hopefully happy users, and they want to run on your cluster, so they make a job submission, usually writing a script that launches some workloads.
And they will basically track with Slurm and Slurm will manage all these job submissions, you won't just have one by one, you will have hundreds or even thousands of jobs that are scheduled to run on your infrastructure, and Slurm will manage the fees and the
priorities and the accounting, etc. So basically it's a batch manager, but there's both resource managing and the scheduling of the jobs. Building a bit deeper into how Slurm works, because this is relevant for this talk, there's basically two main components, two daemons that are the most relevant, and those are
the controller, which is called the Slurm CTLD, and then the daemons that run on the worker nodes at the bottom, which is the Slurm v daemon. And then you have other daemons like the Slurm vp and Slurm restd and Slurm scriptd.
Those are not relevant for this talk, I will mostly focus on the part on the left here. So users and client tools, they basically interact with the controller over a Slurm protocol. There's nowadays a Slurm restd, so you can also interact over rest with some scripts,
but mostly all the userland tools and mostly almost everything in the Slurm ecosystem just talks to the Slurm CTLD, and this controller handles the source of truth for Slurm, so it knows which resources are allocated where, it knows which jobs exist, it knows who the users are, etc.
The controller talks to the Slurm daemons, and talking to the nodes and the Slurm daemons are in charge of launching the jobs, including the cleanups, setting up the cgroups for the jobs, whatever you have. Now, what's important here is to know that for all of this to work, you need at least two things. You need the Slurm config files, and they need to be in sync between the whole cluster,
so you may have some differences, but mostly it should be the same. There was no audio online? Okay. Oh. So, as I was saying, the Slurm CTLD handles the source of truth.
The Slurm daemons are in charge of launching the jobs. And the two important things are that you need the Slurm configuration files. It's mostly the Slurm conf file, but there's other files as well. Those need to be in sync in the whole cluster, and they need to be basically the same.
They should have the same hash, ideally. And then you should also have a shared secret, so that nobody can, a rogue client cannot just add a worker node to the cluster and start doing malicious things. So you have, usually it's a Munch secret, it's called, the daemon called Munch,
and you have a shared secret as well for the whole cluster. And these, this fact is important, is very relevant for this talk. Now, up to containers. So containers are increasingly becoming a super popular tool to run infrastructure for reproducibility, for automating deployments, and just in general,
they're becoming super ubiquitous in our industry, and I think for good reasons. And there are, I think, very good use cases for using containers with Slurm.
In this talk, I will focus on the use case where you use containers on the user and client side of things. So those tools that will talk to Slurm, to the controller mostly, to do things on the cluster. So this could be some automation that you've run to do whatever, for instance, you could use it for monitoring purposes,
you could write a tool that does health checks on the cluster, for accounting, I've used it extensively for accounting as well, but also integration with other services, right? What if you want to connect the Jupyter notebook with Slurm, you will end up with some tools that talk to the controller.
Now, there are basically two scenarios in which you would use containers, you can use containers with Slurm. On the left, we have the local use case, that means, imagine you have a front-end mode,
you have a machine that's configured where it uses SSH2, and from there, they can run the Slurm commands to launch jobs, to track their job usage, etc. It's conventionally called the front-end mode for the cluster. So if you just add the Slurm client container on that node, it's very simple, because you can just, as I said,
you need a secret with Munch, and you need the config files, and that scenario is very simple because you can just do bind mounts, and you can access the Munch socket to talk to Slurm, and you might bind mount the Slurm config directory, and you're done, basically. So that's sort of easy.
However, what if you have, for the use case on the right, you have the distributed or remote use case, and in that case, you may run your Slurm client container in a different service, that's in a different network, or you may run it on Kubernetes, or somewhere else.
In that case, you obviously can't just do the bind mounts, because you need to give it all those things. So you would have to give it all the Slurm config files, and somehow the Munch shared key, so that your external service can talk to your cluster, specifically to the Slurm controller.
Now, this is an extraction from a Dockerfile, this is the naive approach, this is how I started trying things. Easy, right? You just take the Slurm conf, and you just copy it to the destination, and this will absolutely work. But, I was not happy with this approach,
because then you end up managing two copies of your Slurm conf, and I really like having a single source of truth for, when you do configuration management and automation of your infrastructure, I really like having a single source of truth. And managing this in this way with containers is very fiddly,
because it's very easy that you will forget to update it, or something that will fail to update automatically, it's just not ideal, I didn't like this approach, but it will work, it will work. And some of you who know Slurm may say, oh, but Pablo, why wouldn't you just use Slurm's configless feature? So, Slurm configless is a new feature since Slurm 20 or so,
that will basically allow a client to just pull the config files from Slurm. So, the Slurm D demons that run on the worker nodes, when they start, they will just grab the Slurm config files, so you can just remove the needs to even copy the Slurm config file, right?
Well, it's a trick question. Not necessarily, because then you need to run a Slurm D demon in your container, and you also need the Munch demon, and it sounds easy, but it's really not, trust me.
You will need to do a lot of hacks, this is an extraction from a container that I was creating, and you run into lots of awful, awful things. Like, the Slurm demon expects the release agent file to exist in the C group, and the containers, they just don't create it.
I tried it on Docker, I tried it on different Kubernetes versions, it just doesn't exist. I don't know why, I couldn't find out why, if anybody knows, please tell me. Googling around, I found it could have been related to some privilege escalation issues. However, if you just remount the C groups, the file appears, so I'm not sure what's going on there.
Another fun story is that, for instance, if you're using Kubernetes, Kubernetes likes to give a symlink to your secrets, and Munch refuses to take the secret from a symlink for security reasons, makes sense, so this will not work, so you'll need to put in hacks,
and it's hacks on top of hacks on top of hacks, just to run these two demons. And yeah, I was not very happy with this approach either. So, basically I was faced with two options. When you arrive at this situation, you're faced with two options. Either you basically do the first naive approach,
where you just copy all the stuff into your Slurm container, you manage a copy of your Slurm config files, but as I said, if you want a single source of truth, this might not be a deal. You also need, of course, in the use case, you need Munch, and you need to supply the Munch key,
or you can try the configless approach, but then you need to add SlurmD to your container, so it can pull via configless to your config files, but then anyway, you also need Munch, and you need to add the Munch key to your container somehow, and managing secrets, I mean, if you're running Kubernetes, it might not be a big issue, or some other container manager, but you will still need to maintain all these extra daemons with nasty hacks,
and we don't always like having lots of hacks in our infrastructure. There's a third option, by the way, which is trying to go secretless. It doesn't work in combination with configless, where you try to use JSON web tokens, but it gives it lots of issues. It doesn't really work.
I tried it, so I didn't include it here. Just mentioning it in case somebody thought about it. So, Pablo, you talked about the bad and the ugly. What about the good? Is there any good part to this? I'm glad you asked. Yes. What if we had a single-shot CLI tool,
a very simple tool that just was able to authenticate to the controller, either using Munch or JSON web tokens, which SlurmD also supports, and just fetch the config files, and then it's done. That's all you really want to do, right? Because then your tools, the Slurm tools, can work,
because they have their Slurm config files, and just by having the JSON web token in your environment, you can just talk to the Slurm controller. And, yeah, that's the tool that I wrote. It's a very simple tool. It just does exactly what I described there, and it's open source. You can find it on GitHub.
I uploaded it in the past month. Fun story about this. As I said, I had the idea for this when I was back at CERN. I worked on this a year ago already, but then I somehow lost the source. I don't know what happened. Just before I left CERN, the source was just lost.
I don't know why. I must have deleted it by accident. I don't know what happened. So after I left CERN, I kept in contact with my ex-colleagues, and they were telling me that they wanted to do this integration between the SWAN, which is the... Who here knows SWAN? Anybody? Okay, one, two, three. Yeah, so it's the Jupyter notebook service for CERN,
which also does analytics. And we wanted to connect it to Slurm, and we ran into all these issues, because this is a service that's exposed to the whole internet, so we didn't want to have the MunchKey for the Slurm cluster in the container, etc. Anyway, so then I left CERN, and then, yeah, my colleagues were telling me, oh, it would have been so useful to have this, and what a pity.
And then, a few months ago, I just, yeah, I just didn't like the fact that I had lost the source in all these days. I spent a couple of days reverse-engineering the Slurm protocol, and I just didn't like losing it, so I just rewrote it more properly in Python,
and just made it public. So if you're interested in making client containers like this, feel free to give it a try. It looks a bit like this. It's very simple. You can choose between Munch or JWT,
JSON Web Tokens authentication. If you choose JWT, which is the most simple one, you just need an environment variable with a token, and you can tell it where you want to store the config files, and then you have verbosity as an option. So it's very simple. It has very little dependencies.
And so the tool talks several Slurm protocol versions, because with every major release, Slurm changes the protocol versions. So you can list them with minus L,
and it will show you basically all the versions that it supports. So imagine you have a Slurm Web Token in this variable. You can just tell it to do JSON Web Token authentication with the server. It supports multiple controllers, in case you have high availability set up in your Slurm cluster,
so you can specify a list of servers that it will retry until it succeeds. And then you tell it the protocol version of the Slurm CTLD, because it needs to know what protocol it should talk. The protocol version negotiation, I think it doesn't exist in the Slurm protocol, so you have to tell it which version you want it to talk, and that's it.
And then it will just download the Slurm config files and happy days for your containers. Conclusions, I think I'm ahead of time. So this tool called Straw, it can simplify the cost of creating and maintaining your Slurm client containers.
It can also increase the security, because you don't need to put the MunchKey everywhere, where you're running your client containers, JSON Web Tokens, Sophos. Caveats, caveats. I think this tool should not exist, because ideally this would be supported upstream.
So if anybody has any influence on CTMD Slurm development, yeah, I think it would be nice if we had this built in into Slurm. And then the second caveat is that the JSON Web Tokens, the token needs to be associated with a Slurm user, basically.
So ideally, you would be able to just generate a JSON Web Token for a user that's going to run on the Slurm cluster, and then if the secret for some reason is exposed, you've only exposed the JSON Web Token of a single user.
However, this is a limitation built into Slurm, basically. You cannot pull over the protocol, the Slurm config file, unless the token belongs to the Slurm user, or to root. Still, I think it's an improvement over having your MunchKey available everywhere. And yeah, feel free to try it out.
That was it. I'm happy to answer any questions you might have. Thank you very much, Pablo. Time for questions. So what kind of clients do need the config file?
Could you do everything over REST nowadays? Is it still necessary to use the config file? Yes, so anything that wants to run S run, S patch, S queue, S info. For instance, if you have the Jupyter notebook plugins, they will just run those commands. Or if you want to run a client that uses PySlurm, for instance,
or any library really, anything that uses libslurm underneath, will automatically read the config files, right? So of course, you can write your own client, handwritten from scratch, that just interacts with the Slurm REST D to do stuff.
Yes, but you cannot leverage all the existing user client tools. And the libslurm, PySlurm, et cetera. So if you want to create a Python tool, for instance, that leverages PySlurm, this would be, I think, a good solution.
I think Slurm does have a REST API, but it's considered very insecure. So even the documentation tells you, don't use this. I just didn't understand for a long time now why everyone needs the config file, right? I mean, why does it need to be in sync? Couldn't they just exchange the information over the protocols now
and just say, this is your Slurm server. Yeah, that's the conflictless feature. That's the conflictless feature, essentially. Yeah, but the conflictless feature just downloads the config. Look, config less, OK. Download the config. I don't need the config beforehand. It's like serverless. There's always a server somewhere. Yeah, exactly.
So that's just how Slurm works. So I'm still a little confused about the Slurm client container. So the container is an application on the actual Slurm client because you have to document in the Slurm conf,
you have to sort of say what your clients are so that the scheduler can intelligently decide how to schedule jobs, right? I'm missing something. No, you don't really need to declare all the clients for Slurm. You just need to declare the worker nodes that are a part of it. But you can have any, I mean, it depends on how you've configured it.
You can limit it. You can limit in Slurm which clients are allowed to connect, but you don't have to. So you could just, but even if you do, you will need this because you will, even if you authorize a host name to connect as a client, it will need to have the MunchKey and the Slurm conflict files, et cetera.
Does this answer your question? Well, no. So when you, in the Slurm.conf, you sort of detail what your positions are and you have to kind of tell it what the capabilities are of your clients, of your Slurm clients, right, so that Slurm can decide how to schedule jobs. I'm missing something here. Well, I think you're thinking about the compute nodes. Yeah, I am. Yeah, the node names part of the Slurm.conf.
Right, so the containers run on the compute nodes. No, the containers would be, let me go back to one of the slides where, so you're thinking maybe about the compute nodes, each of which runs a Slurm d-demon, and those you have to declare, yes. I think in 2023, by the way, you will be able to dynamically spawn compute nodes,
but that's future. What I'm talking about is all the users and client tools that connect to the controller to run SQ as info, like when you use Slurm and you, so if you had some tooling that you automated to gather metrics from Slurm or,
yeah, a Jupyter notebook service, for instance, that connects to your cluster, that wants to launch jobs, that wants to run S patch, SQ, whatever, that's in that domain. Yeah, I mean, the newest werewolf, werewolf runs containers. I mean, I think the newest version of werewolf
is set up to run containers on the Slurm clients, right? It's sort of, you're actually launching containers as applications, so that was kind of, that's on the compute nodes. On the compute nodes, yeah. Yeah, yeah, that's the compute nodes. Thank you for your talk. So I have a question.
You are telling that you can pull the configuration with your tool, but there are many configured files you can't pull with config.less. For example, all the Spunk plugins, or I think, topology, you can pull it, but there is, like I said, Spunk plugins and so on. So how do you manage these kind of config files
that are not entered by default by Slurm? Right, that's correct. So when you use the config.less feature, it will download the Slurm conf, the cgroup conf, a lot of config files, but it will not download your plugins, your plugin files. But I think those are usually not needed if you're running a client, because those are usually just needed
for the Slurm D demons, right? Even for the worker nodes. Like the epilogue, the prologue, you mean all of those plugin scripts, right? The authentication plugins. Those are usually needed by the Slurm D demon, but if you're just writing a client, but say you're automating something with pyslurm to interact with it, you don't need those files.
And Slurm will happily, you can happily run sinfo, srun, sbatch, or sq. You can happily run all of those commands without those files. Yeah, okay, so if I just summarize, the idea is just to create some frontend nodes, but not really work nodes. That's right?
So if you wanna use config.less to set up a frontend node, you might need those files from somewhere else. But if you're just creating a container to just interact with Slurm and send Slurm commands, you don't need them, basically. Because the plugin files are usually the,
yeah, the epilogue, prologue for the Slurm D or the Slurm CTLD, and that's not what these Slurm client containers are about. So short answer, you usually don't need them.
Hello, thank you for the talk. I'm wondering, in huge institutions like in Slurm or EPFL, would you run your own forked or patched Slurm so you could fix maybe the authentication privileges,
or is it just not done because it's... I've never carried any Slurm patches, to be honest. I've always, both at Slurm and at EPFL, we just use Slurm out of the box. It works well enough for our use cases. It is true that you could, for instance, do a patch to enable finer granularity for the permissions.
For instance, you could enable any user to pull the config file. That would be a nice patch. We don't do it, yeah. Okay, thank you. We have time for one short question. Hi, thanks. We actually are very interested in this because we have a JupyterHub front end
that actually talks to a Slurm cluster through SSH because we don't want to install all that stuff like the Munch and the full Slurm deployment into the JupyterHub host. And I'm wondering, how does it talk, actually, to a Slurm control? So is a Slurm control always listening to any host
that will talk to it, or is there any restrictions to who is connecting to the Slurm control daemon? So there's an alloc nodes setting in the Slurm conf, I believe, which will allow you to restrict from which nodes you can allocate resources.
So you can limit it. However, if you don't have that, I think Slurm will happily accept anything because if you have the shared secret, it's considered good enough. Or a valid JSON web token. Yeah. Thank you. Thank you very much, Pablo.
Thanks.