We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

XALT: User Environment Tracking

00:00

Formale Metadaten

Titel
XALT: User Environment Tracking
Alternativer Titel
XALT Tracking user jobs and environments on a Supercomputer
Serientitel
Teil
41
Anzahl der Teile
110
Autor
Lizenz
CC-Namensnennung 2.0 Belgien:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
19
20
Vorschaubild
44:46
23
30
Vorschaubild
25:53
69
Vorschaubild
25:58
76
78
79
96
97
ObjektverfolgungProgrammierumgebungSupercomputerDiskrete-Elemente-MethodePhysikalisches SystemSchwebungMehragentensystemDatenbankMathematikProdukt <Mathematik>URLMereologieAnalysisSkriptspracheNabel <Mathematik>GeradeProgrammbibliothekPhysikalischer EffektWeg <Topologie>PunktgitterEndliche ModelltheorieProgrammierungCASE <Informatik>ZahlenbereichVariableWort <Informatik>InformationElektronische PublikationProzess <Informatik>WasserdampftafelRechter WinkelProjektive EbeneGruppenoperationSelbst organisierendes SystemFigurierte ZahlVersionsverwaltungAssemblerHierarchische StrukturTopologieProgrammierumgebungReverse EngineeringSpannweite <Stochastik>ExpertensystemGebäude <Mathematik>Computersicherheitt-TestMapping <Computergraphik>GrundraumResultanteAutorisierungVerzeichnisdienstKlasse <Mathematik>BildschirmmaskePhysikalisches SystemLastDatenbankCompilerBimodulExistenzaussageGüte der AnpassungEinsMultiplikationsoperatorSchnittmengeVollständigkeitWeb SiteCodeSpeicherabzugBinder <Informatik>SystemverwaltungMAPNegative ZahlXMLVorlesung/Konferenz
DatenbankMathematikSupercomputerObjektverfolgungProgrammierumgebungDiskrete-Elemente-MethodeCILWeg <Topologie>SkriptspracheMultiplikationsoperatorMinkowski-MetrikProgrammbibliothekInjektivitätSoftwarewartungProgrammierungTabelleQuick-SortSpywareÄquivalenzklasseCursorWeb SiteProgrammierumgebungDatenbankCodeCodierungHalbleiterspeicherBinärdatenMini-DiscWarteschlangeElektronische PublikationAutomatische IndexierungInformationsspeicherungVariableDatenfeldZahlenbereichGanze ZahlAssemblerPhysikalisches SystemProzess <Informatik>BinärcodeSpeicherabzugVerzeichnisdienstKontrollstrukturTermBimodulHilfesystemAnalysisWasserdampftafelGewicht <Ausgleichsrechnung>EinfügungsdämpfungGeradeFigurierte ZahlOffice-PaketWort <Informatik>SystemverwaltungKategorie <Mathematik>GruppenoperationSoftwareentwicklerComputersicherheitDifferenteEndliche ModelltheorieGüte der AnpassungPhysikalische TheorieMathematikRechenwerkTypentheorieArithmetische FolgeMereologieRechter WinkelSichtenkonzeptObjekt <Kategorie>Vorlesung/Konferenz
DateiformatBinärdatenObjektverfolgungSupercomputerProgrammierumgebungInformationVerschlingungPhysikalisches SystemWeb SiteProgrammierumgebungProgrammbibliothekModulare ProgrammierungFunktionalGeradeCodeZahlenbereichProzess <Informatik>BinärcodeAblaufverfolgungWeg <Topologie>RankingNeuroinformatikTypentheorieSoftwaretestRechter WinkelDemo <Programm>Grundsätze ordnungsmäßiger DatenverarbeitungBimodulVektorraump-BlockPhysikalische TheorieEndliche ModelltheorieZellularer AutomatGamecontrollerInteraktives FernsehenMultiplikationsoperatorHypermediaSichtenkonzeptVorlesung/Konferenz
SpeicherabzugGoogolVorlesung/KonferenzComputeranimation
Transkript: Englisch(automatisch erzeugt)
Let's get started. So the next talk is Robert McClay from Tech. He's going to talk about exalt and tools to figure out what your users are doing on your tool. OK. Thank you very much. I'm Robert McClay. I'm from the University of Texas.
I have colleagues, Mark Fahey, who was at Nix, is now at Argonne, and other colleagues, Ruben, who's from Argonne, and my students, Sandra Sweat, who are working on this project. What is exalt? Exalt is a tool. It's a National Science Foundation funded project with PIs, me and Mark.
And the goal of exalt is to take a census. We know from accounting how much a user spends, but have no idea what they use. And we'd like to have a better idea of controlling the system. How many people in this room are system administrators? How many people in this room aren't system administrators?
What are you doing in this room? Anyway, so the problem as a system administrator is, especially in an academic, kind of academic place like we are, we get students, freshly admitted graduate students, who don't even know how to spell Linux,
and asking where their folders are. And so our users range from complete newbies to experts who know more than I do. And the issue is that we need to help. And so we want to know what they're doing. We want to get a better idea. We want to be able to track what they're using
because we're building stuff. And we'd like to know whether we should continue to build it because if nobody's using it, we don't need to build it anymore. Let's do something else. And exalt is currently running at TAC, the site I come from, at Nix, at University of Florida, at KAUST, also at NCSA, and it can integrate with other tools
like tax stats, which I won't bore you with because I don't have time. So our goals are to be extremely lightweight. We want the users to not know anything that's going on. We'd like to provide provenance data, i.e. I'd like to be able to tell a user, if they asked, six months ago you ran your program, you ran this program with this environment variables,
and this was the command line on this machine. We'd like to know what libraries people use. We want to collect data into a database for analysis, but you could also use tools like Splunk or ElastiSearch to do this analysis as well.
So we've got to do something. We've got to get into your code. So what we're going to do, what we do is we hijack the linker. That is, we have an LD script, LD shell script, which does what we need to do and then calls the real LD.
And what it does is it generates assembly code, it essentially watermarks your code, and then it also provides a trace map so we can collect how you linked your code, and we collect that in an adjacent file. Then what we used to do is we also hijacked the launcher, MPI run, IB run, whatever your tool is,
and there's a shell script which then would track the start of your job at the end of the job so we would know how long your job was as well as what environment variables you had, and we'd collect that also in an adjacent file and collect that. Well, fortunately, I don't have to do that anymore,
and I'll explain why later, but it works, but it's probably bad. So what we do is we have at least three methods that we support to collect the data. So we produce JSON information. We could write it to a file, which we could collect nightly. We can send that information directly to syslog, which when your tools like Splunk or Elasticsearch,
Logstash can analyze it there, or you can write straight with DB, and we have sites that use all three of these methods. So Carl, the previous speaker, was very nice to talk about Lmod. I happen to be the author of Lmod, and we do use the hierarchy.
Well, you don't have to, but we use the hierarchy and one of the things that's a problem with the hierarchy is module avail no longer tells you everything that you can load. So I have a command called module spider, which says you walk the tree, and you walk the tree, and it can figure out modules that you can't load
but are available by loading other things. You know, this version of PETC is only available with this compiler and this MPI stack. Well, what's cool is modules, a module tells you pass. Well, if you've got all that information, you can take what I call a reverse map, do the reverse take directories in the path
into module files so you can say, hey, these are the modules that people use, not just the paths. And for anybody, anybody on an HP system that doesn't use modules? Oh, good, you're all set. Or at least not willing to admit to it that you're crazy. So this is a very useful thing. So for example, I can map between this crazy path
right here to a particular module. The other thing is that when I was doing this, since I know the author of Lmod, I had this feature that in a module,
I want to have this path, this exalt path, be in front of everything else. So I modified Lmod to support this thing called priority. So I have a priority path which says, every time I rebuild the path, I make sure this one's in the location where there's priority of at least 100. So you can have a bidding war,
but most of the time nobody else knows about it. But me, and you, don't tell anybody, that you can set priority paths so when the module command reruns, it swaps that path all the way to the beginning. And just in case if you ever need it, you can set a negative number to make sure something's all the way at the end. Okay, I should have started my talk off like this.
We have a few big data people still left? Okay, good. Big data sucks. I mean, it's a huge pain in the ass. Well, I guess I should say, if it was easy, anybody can do it. So, and one of the things I've learned,
well, first of all, I started out knowing zero about databases and now I know just enough to be completely dangerous. The things that I thought were easy turned out to be hard, and I thought 2.1 billion was a big number. It's not!
We produce about 8 million jobs a year on Stampede. It's a 6,400 node system with about a half a million cores on it if you count the mic cores. And so we've got 8 million jobs.
We keep track of the environment variables. The environment variables we get at about 250. We multiply those two numbers together and all of a sudden you're at 2.1 billion. So my database uses an integer as the index assigned integer, which tanks out at 2.1 billion.
So I can't store any more environment variables. I'm redesigning the database. But keeping track of the environment variables is huge. So I'm going to work a different way to do this. And I was surprised that that was the biggest thing we had. So I'm changing the database to support environment variables. I want to be able to provide Providence data.
I want to be able to tell you how your program was built. I want to be able to tell people who are, you put a ticket in as a user, this job is bad, and I'd like to be able to say, give the people who are answering the ticket, here's the environment, here's the modules they had loaded, here's the things they used. But in terms of doing SQL analysis,
I don't need to record all those variables. So I'm going to change it. So I'm going to store the environment as a clean blob. And I'm going to filter it to cut down the number from about 250 to around 20 or so. So I've been running Xalt for about a year,
as you might have guessed, or a little over a year now, a year and a half. And one of the things that certainly are more common here, but in the U.S., we don't see a lot of people using UTF-8 characters, people putting tildes in their names. I mean, you get to America and those things in your name disappear. We don't know how to deal with them.
But UNIX supports UTF-8 directories, UTF-8 names. And the first time I discovered, somebody named a directory with a tilde in it, and it broke Elmon. I mean, it didn't break Elmon, it broke Xalt. And so I had to support UTF-8 characters. That took a while to sort out. And I've changed to use this query,
cursor execute style of putting in data, and this also prevents SQL injection so I don't get in trouble if somebody names their program. Johnny dropped tables with a couple of assembly codes sort of in the right places. And it also supports UTF-8 characters.
So, and this is something I think is a mistake on my part, which I'm gonna try and resolve as time goes on, but I'm doing stuff in user space. I'm hijacking the linker, and I'm gonna do something there, and right now, I'm calling Python. And Python, I'm running in your environment.
And most users, this is the very same thing to do, but I want to be 100% safe. And so I've gone to using, there's four ways you'd get in trouble with Python. LD library path, path, Python path, Python home. I discovered there's yet another one that I didn't know about until I got here. But what I'm doing right now is I'm defining LD library path
to be the one that you configured it with, passing no path in, using the absolute path to the Python you configured with, minus E to say ignore these two variables, and then your Python script. So, and that means everything has to be hard-coded, and if you use a shared file system, this means that you have to have unique install per cluster
even though you're on the same home file system. Those things like base name move around. Sometimes it's bin base name, sometimes it's user bin base name. Anyway, so that's what I'm doing to protect me as well as I can right now, but I think I'm gonna have to move away from Python for the stuff that's in user space,
something that's gotta be statically linked or isn't affected by pip install and stuff. So you're gonna collect this data, you better find something to do with it or we're just filling up disks with useless data. So we want, how can we use this data?
Well, one of the ways we've been able to use this data is to deal with users abusing certain queues. So we have a large memqueue with nodes that have one terabyte. And it's like there's 16 nodes in that thing, right?
Yes. Thank you. And it was getting abused. It was getting heavily used by a large chemistry code and we were able to use exalt to find out what codes were being run and we were able to use a tool we also had called packstats
to check memory usage and showed that these people couldn't run in the normal way the way you do to handle normal nodes, namely, you know, 16 cores, each getting about two kids apiece, they couldn't run their jobs. But if they could run in reduced weightiness, i.e. use half the cores, half the cores or a quarter the cores
and get more memory per, they could run the normal queue. So we were able to move those users out of the large memqueue back to the normal queue and the wait time went from four days to two. And the definition of a crisis in my mind at a center
is the center administrator, the center director is looking at the queues on his phone while he's flying from place to place and checking and seeing if we're doing well or not. So the center director's interest is a crisis. So we've also, at our side at least,
when you create an account, you have to also say that what your field of science is. And it's selfish. We don't check to see if you say you're an engineer and you're really a chemist. We don't know that. But we can take that data and look at how you run on our system and look at what community codes
and we can provide anonymized, take the user names, charge accounts, and we've got a unique mapping and add a field of science. So we can do, we've done some preliminary analysis which tells you the things you already know, which is that, you know, yes, chemists use VASP
and NMD and all that stuff and there's a lot of Python stuff going on. But it's nice to be able to prove that that stuff's there and we'll be able to do more with it as time goes on. So, as I said, we used to be, only track MPI jobs. And while for a site like TAC, that's pretty good. We get many of the jobs, most of all the big SU jobs,
take up most of the charge time, but not the number of runs and not the number of things in terms of helping users. In fact, it tends to be the smaller the jobs, the more help they need. So we want to be able to track non-MPI jobs.
And so hijacking MPI run or the equivalent at each other's site was a maintenance nightmare. And so we now are going to use, we're using an ELF binary trick to track jobs. So it turns out that ELF has these magic buttons, has this thing, a NIT array and a Feeney array, which, I don't know, oh yeah.
Okay, anyway, so you can track, you can register a function in those things and this function will be called before main and this one will be called after main.
So if you're going to do this, you're going to collect a lot of data. So depending on your site, you may or may not want to copy CP or things like that, but I was talking to some colleagues in a bio site and they said their number one job was AWK.
So there may be things you want to track that I didn't think of before, but you can choose how you want to select the things you want to check or not. Okay, now we have to see whether the demo gods are going to be restarted.
So running a module system, don't you run a module system on your laptop? With the exalt, and so I can do type minus A on LD and it shows that I've got the one in exalt
rather than the one in the system. If I build something, it built it.
I'm writing stuff to this directory and so it built, I'm testing this thing, and so it just built one right now. And so I'm going to move that and we'll do that again.
So there, I got it. That was generated immediately and it has information about the jobs, it has the link line and all the stuff like that.
And so I can run it. This is just a simple hello world code. It just does that. And I've got run information,
so I didn't need to do anything, it magically appeared. And one of the things that you're going to have to filter these things and so you can track what's performing, you can put a debug library and say LD library path and you find all kinds of information and it tells you,
for example, it can tell you that, yes, I'm tracking, it has the right state, it has the right rank and the post name was accepted and the path was accepted. So you can do things like you control what things you want to track, what things you don't want to track
and all drive it to a database and analyze it. I'm done. I would guess everybody would say this may be opt-in,
but never ever you should take down what somebody's typing and just without asking, every keystroke you're doing, this is just a database. Well, we're not, well, I'm only tracking,
at our site, we're only tracking things that run on the computer. So you typically, you know, it's what the job runs on. So we're not tracking what you type on, although we can. We're not doing that. And this is actually an interesting problem. We have this thing called HIPAA, which is medical data, and so we're going to have to figure out that we don't trace
if you're doing stuff in HIPAA. This stuff doesn't do it. Have precompiled binaries or you can't trace anything? Well, actually, well, but I'm using LD, I'm setting LD preload, so I will track it.
LD preload will look into those functions, even though it keeps you precompiled. So I can even track things like DDT or something and it becomes more partial. I'll know that it ranks those functions. What does this have? What do you mean? I mean, what do you mean?
Well, I think I can control what gets implemented. It may be the same. I'm not really familiar with LD, but I'm also capturing the environment and taking it away. What do you mean?
Well, then I have to look at it.