Ganga: an interface to the LHC computing grid
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Part Number | 71 | |
Number of Parts | 119 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/20004 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Production Place | Berlin |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
EuroPython 201471 / 119
1
2
9
10
11
13
15
17
22
23
24
27
28
41
44
46
49
56
78
79
80
81
84
97
98
99
101
102
104
105
107
109
110
111
112
113
116
118
119
00:00
Interface (computing)Interior (topology)Roundness (object)NeuroinformatikQuicksortComputer animationMeeting/Interview
00:47
Metropolitan area networkInterface (computing)Grid ComputingFlynn's taxonomyPower (physics)Mathematical analysisMereologyInterface (computing)NeuroinformatikData storage deviceUniverse (mathematics)Multiplication sign
01:11
PhysicistMetropolitan area networkRing (mathematics)DataflowData structureCollisionRange (statistics)MereologyFunction (mathematics)Cartesian coordinate systemCASE <Informatik>Observational studyUniform resource locatorBitMetreOrder (biology)Computer animation
02:14
ArmNormed vector spaceVideo game consoleOpen sourceSoftwareAxiom of choiceWebsitePhysical systemPower (physics)LogicNumberQuicksortService (economics)Interface (computing)Mathematical analysisMereologyNatural numberSemiconductor memoryUniverse (mathematics)Projective planeNeuroinformatikEvent horizonLatent heatElectronic data processingData analysisCollisionElectronic mailing listIntegrated development environmentResultantLevel (video gaming)Product (business)Interactive televisionWhiteboardCorrespondence (mathematics)Multitier architectureProcess (computing)Order (biology)
04:48
Forcing (mathematics)SoftwareOpen sourceLecture/Conference
05:09
Process (computing)Software testingFunction (mathematics)Computer fileDirac equationInteractive televisionArc (geometry)Large eddy simulationoutputHeegaard splittingDiffuser (automotive)RootCartesian coordinate systemHistogramCASE <Informatik>Local ringProcess (computing)Line (geometry)Computer filePhysical systemScripting languageMathematicsBitoutputStandard deviationFunction (mathematics)Power (physics)CodeSet (mathematics)Diffuser (automotive)StapeldateiParameter (computer programming)Series (mathematics)HistogramObject (grammar)Suite (music)Table (information)RootCrash (computing)Multiplication signSoftware testingSoftware bugUniverse (mathematics)NumberNeuroinformatikFront and back endsInformationDirectory serviceDemosceneDatabaseElectronic mailing listMathematical analysisString (computer science)Interface (computing)TrailSoftwareType theoryHeegaard splittingText editorInstance (computer science)Gastropod shellHard disk driveFile formatLoop (music)Black box1 (number)User interfaceResultantStandard errorBinary codeCartesian coordinate systemPoint (geometry)Virtual machineChemical equationWeb pageMengenfunktionInterface (computing)GenderEndliche ModelltheorieInsertion lossGroup actionProjective planeGoodness of fitEmailLatent heatDescriptive statisticsSingle-precision floating-point formatComputer animation
12:24
RootMathematical analysisFunction (mathematics)Computer fileSingle-precision floating-point formatSet (mathematics)Point (geometry)ResultantScripting languageElectronic mailing listQuicksortFunctional (mathematics)AverageNumberProcess (computing)Social classDirectory serviceWeb pageLecture/Conference
13:07
outputDiffuser (automotive)Process (computing)Library catalogLocal ringTable (information)Cartesian coordinate systemRun time (program lifecycle phase)Function (mathematics)Coma BerenicesComputer fileRootFile systemComputer fileFunction (mathematics)Parameter (computer programming)NumberProcess (computing)DatabaseBitLengthGreatest elementLogicEvent horizonFunctional (mathematics)QuicksortLatent heatScripting languageTrailPoint (geometry)NeuroinformatikoutputData analysisInformationRow (database)Programmschleife1 (number)Loop (music)Electronic mailing listInterface (computing)Multiplication signData storage devicePoint cloudHistogramPhysical systemRandomizationWebsiteDirectory serviceLibrary catalogError messageComputer programmingUser interfaceComputer networkFront and back endsCodeDialectAtomic numberProduct (business)Medical imagingSubgroupMathematical analysisFood energySemiconductor memoryMoment (mathematics)DivisorLevel (video gaming)Speech synthesisFocus (optics)Sheaf (mathematics)Uniform resource locatorObject (grammar)SummierbarkeitDegree (graph theory)JSON
18:03
Interface (computing)Function (mathematics)Mass storageProcess (computing)Computer fileBand matrixData storage deviceoutputData managementSource codeMathematical analysisPlug-in (computing)HistogramWebsiteFile systemStandard errorScripting languageShared memoryInformationComputer programmingOrder (biology)SoftwareUniform resource locatorForm (programming)Inverse elementVideo gameNeuroinformatikStandard deviationSummierbarkeitDistortion (mathematics)Lecture/Conference
20:00
CodeMereologyCodeProcess (computing)Computer programmingAtomic numberSource codeMultiplication signLink (knot theory)Random matrixOpen sourceProjective planeDataflowComputer animation
20:41
1 (number)CASE <Informatik>BitModal logicCodeScheduling (computing)Front and back endsRight angleFitness functionLecture/Conference
21:30
Process (computing)Large eddy simulationRootoutputComputer fileDiffuser (automotive)Function (mathematics)HistogramoutputComputer fileProcess (computing)Loop (music)Electronic mailing listLine (geometry)CASE <Informatik>String (computer science)Subject indexingReading (process)Data storage deviceInterleaving
22:33
InterprozesskommunikationComputer fileQuicksortProcess (computing)Right angleMereologyTelecommunicationDirectory serviceDefault (computer science)Mathematical analysisAbsolute valueSoftwareSupercomputerLogicData conversionLocal ringDirection (geometry)BitCodeDifferent (Kate Ryan album)
23:52
Process (computing)Computer fileFunction (mathematics)outputCartesian coordinate systemDirac equationLarge eddy simulationProcess (computing)Default (computer science)AreaDirectory servicePhysical systemMultiplicationCASE <Informatik>Computer fileFile systemFunction (mathematics)Local ringComputer animation
24:19
Lecture/Conference
Transcript: English(auto-generated)
00:15
Our next speaker is Matt Williams. He has worked on the computing infrastructure of CERN and he's going to talk about how
00:27
the LHC's computing grid works. So please give a warm round of applause for Matt Williams. Thank you. So I'm Matt Williams.
00:40
I recently finished my PhD in particle physics. I was working on the LHC-B experiment on the LHC for four years, recently graduated. And I'm now working at the University of Birmingham, working on computing resources for the scientists who there themselves are doing the analysis now. And it's as part of that work that I'm helping to help develop this tool, Ganga, which is
01:02
an interface used by scientists to interface with a huge amount of computing power and storage available to them as part of the LHC computing grid. So a brief little update in case anyone here doesn't know anything about CERN or the LHC. It's the world's largest particle physics experiment, or at least the world's largest man-made one.
01:21
It's arguably the world's largest man-made structure as well, being a 27-kilometer-long ring 100 meters underground in a tunnel dug specifically for the purpose. It's a proton collider, so it's accelerating protons to near the speed of light and climbing them together at four locations around the ring. And each of those, there's a detector which studies the outputs of those collisions and
01:44
analyzes the data that's given to them. Given the huge amount of collisions that are happening every second, billions and billions are happening, it's outputting a huge amount of data. I mean, the amount of data that it is producing is way beyond what we would actually be able to collect. But the stuff we do collect to date equals something like 200 petabytes, though already
02:04
it's probably a bit higher than that, and it's only going to grow as the accelerator gets more and more powerful in the future. So, in order to be able to process that huge amount of data, alongside the design of the LHC was a corresponding project called the grid.
02:21
The idea of this was to produce a computing environment which would be able to handle the large amounts of data and processing power that would be required. So it works on a tiered system. So at CERN, there's a central hub, a Tier 0 grid site, which has a large amount of computing power. They then defer down to a single site in each country that's involved in the LHC.
02:43
There's about 12 or 13 of those Tier 1 sites spread around the world, one in each country that's involved. And the level below that are the Tier 2 sites. There's around 160 of those. Each of those is generally something like a university or a research institute. There'll be a dozen or so in each country, for example. Some countries have more, some countries have less.
03:02
And it's at the Tier 2s and the Tier 1s where the largest amount of data processing is done. And the sort of data that we study at the LHC, in the sort of analyses that we do, really does lend itself to this sort of distributed nature. You tend to end up, if you're doing an analysis, with a list of collision events. Maybe you've got 10 million, 100 million events you want to look at.
03:21
You can very easily take a small chunk of those and process them independently of any other chunk of data. There's no real interaction between the events. So you can very easily chunk it up, send that out to wherever it needs to go, and then collate the results at the end. So, as I say, the project was evolved alongside the LHC.
03:45
So even in the early days, well before the LHC actually started, people were looking into building these computing systems to provide the services to the scientists that need them. So in 2001, the LHCB project started work on Ganga. This was their in-house specific interface to this grid infrastructure.
04:02
Each of the other experiments were also working on their own personal projects in order to interface with the grid. Since everyone was convinced that they had their own special problem that only they could solve in the way that they needed it done. However, the LHCB project Ganga was designed using a Python system with the explicit goal of being pluggable and extensible and so on.
04:20
And so it was very easy in the intermediate years to take the parts of it that were LHCB specific and remove them and allow other experiments on the LHC to plug in their small part of experiment specific logic that's needed. So the Atlas experiment, there's a number of scientists on the Atlas experiment who are using Ganga for doing their data analysis. And in fact, outside of that whole ecosystem as well, there's the T2K experiment, which
04:44
is the neutrino experiment in Japan. Some of their scientists are using Ganga for interfacing with the grid resources which are provided to them as well. Of course, all the software that we create at CERN, or as far as I know, all the software is completely open source. Ganga itself is GPL and the vast majority of software that comes out of CERN is GPL or
05:03
other more liberal licenses. So how does it actually work? So if a scientist has a bit of code they want to run, they can use this tool Ganga to interface with the grid system. Or in fact, not just the grid system, they can interface with any other system that Ganga
05:21
has an interface to. So in this case here, you see on that second to last line, we're setting the back end to be equal to local. That's telling the Ganga system, don't run this on the grid, just run it here on my machine. That's something that's often done by scientists when you're testing a bit of code. If you've just written a new piece of analysis software, you don't want to immediately throw it up onto the grid infrastructure, run it 10,000 times and have it crash within
05:43
three seconds because of some bug you've put in. So it's a good idea to test it locally on a small set and then later on be able to submit it up the grid. So it all centers around this job object at the top. You can set some parameters on it. Here we're setting the name parameter to give us a string which we can use for bookkeeping and keeping track of what jobs we use for what since all the job information gets stored
06:01
into a persistent database where you can see it all later. The real workhorse of the job system behind the scenes is the application. So the application is what is actually going to be run where this thing is going to be run. In most cases, you just want to run an executable. It can be an executable binary or it can be Python script or in this case, it's just
06:21
a small shell script. So you just say to Ganga, this is the thing I want to run, this is the actual code that's going to happen and this is where you can find it in this file here. In this case, this script is just going to create a file called out.txt. And so we're telling Ganga the output files from this job. These are the ones that are going to be made by it. These are the ones we want to make sure end up back where we are now.
06:41
We want to make sure we've got a copy of those in our local output directory, wherever the job was actually run and whether that file was originally created. And so we specify that it's a local file. In output files, local file means copy it back to where I am locally. Once we've set up our job object, we just call submit and at that point, the Ganga subsystem is coming to play. The monitoring loop comes in, it starts submitting the job to the system.
07:04
In this case, it's just going to start up a local shell instance somewhere else on your computer. But if you were accessing the grid, it would be uploading it to the grid somewhere. It would then keep track of its status and make sure it's downloaded any output files at the end of the job. So once it has finished, you can just access the output files directly inside the IPython
07:22
based Ganga user interface. So you can just call the peak method on the job you just had and it basically does an LS of the output directory. You see it's created a file for the standard out on the standard error and most importantly, the out.txt we asked it to give. And if you want to peek further inside, you can pass the name of one of those files to the peak and it will open up a pager directly inside IPython and you can have a scan
07:44
through and look at the output files to make sure that everything worked the way you wanted it to. Obviously that was just a toy example. That's nothing more that you can do there than simply running a local script on your local computer when looking at the output file. So it would be good to be able to leverage the power of the grid. And it's as simple as changing the back end on that last step from local to lcg,
08:03
where lcg stands for the LHC computing grid. It's the acronym that we use for that. So with one small change of one line to the other, you could run exactly the same script and that code would be uploaded to the grid system. The grid system would take over, distribute it, run the code wherever it ends up running. You don't even worry, it could be in China, it could be in America, it could be in Amsterdam,
08:22
it could be here in Berlin, it could be anywhere. And it's completely seamless to the user, at the end the data will be copied back and everything is the same, you don't even have to worry about it. But Gangla's more than just that, it's more than just locally running stuff and the grid. It can interface with anything that you can access via an API, basically.
08:40
So there's a series of back ends for, you see here, PBS, LSF, and SGE. Those are batch systems, often universities have got local batch systems or a batch farm of some kind which they use for running jobs which are somewhere between running on your local computer and you want to upload to the grid. And again, you could just change it to PBS and it would be submitted to your local farm and you wouldn't even have to worry about any of the details.
09:01
These last series of one here are a set of experiment-specific back ends. So various experiments have got their own middleware interfaces sitting between Gangla and the grid to make an onion layer type situation to provide extra features that maybe that experiment particularly needs. But again, it's all a black box as far as users are concerned.
09:20
You don't have to worry about what's going on, it's just going to work. So now that we're using the grid, it would be good to really make use of the huge amounts of power it provides. So let's say, for example, you have sitting on your local hard disk a directory containing a whole load of files, maybe you've got 3,000 files or something in there. Each of them is some number of megabytes, so it's adding up to a gigabyte, let's say, of data or something like that.
09:43
So there's a lot of data you're going to want to analyze. You can tell Gangla that these are the input files you want to win your job over. From that point on, Gangla will keep track of those input files. It will make sure they get copied to wherever the job runs, whether that's locally, on your back system, or if it needs to be copied out to the grid, it will make sure those files end up where they need to be.
10:02
Of course, if we left it at that, it would be pretty useless because you'd be taking one huge chunk of files and copying them to one place on the grid and they would just be run in one single compute node somewhere. It would be good to be able to distribute it around and make sure we're running things in parallel. And Gangla provides a tool for this called splitters. So again, you define on the job object a splitter parameter.
10:24
And in this case, we can use the split by files object. So this is an object which knows how to split the files up into a smaller set of data. We simply take one parameter, files by job, so it's going to take this list of however many files you have, chunk it up into chunks of 10 or maybe less files if there's not enough to fill a chunk,
10:44
and take each of those chunks, add in your analyzed data script that you want to run with it, submit it, and the grid will put it somewhere. It will take the next 10, that will go up, and that will be sent off somewhere. And it will keep doing that all the way through the list, and you'll end up with some number of hundred sub-jobs that Gangla will keep track of for you.
11:02
So you won't have to worry about how many sub-jobs are made or doing it manually. It's completely automated. At the end, each of those sub-jobs is going to create a histogram.root. Root is a file format we use at CERN that's basically a table of data as far as this stuff is concerned. It can also contain histograms and so on, basically a table of data.
11:20
By specifying the local file here, this isn't saying that file is going to be made locally, it's being made wherever the job is run, and you don't care where that is. But you're asking Gangla to copy it back to your local computer so you can have a look at it, open it up in your text editor or analysis software or whatever you're going to use to analyze the data. But that's not ideal, even then, because you're going to end up with
11:41
however many hundred copies or variants of this histogram.root. They'll get put in a sub-directory structure, but still they're going to be separate files that you're going to have to go through manually and look at. So, to solve that problem, Gangla provides something called a merger. It provides a whole suite of mergers, but the one in particular here is the root merger. So this is a little bit of Python code
12:01
which understands how to concatenate together root files. It knows how to stick them together and combine them into one single file. And this, again, is completely automated. Once the job's been uploaded, it's been split up, sent out all over the world. Gangla's downloaded all of the results from each of the single sub-jobs. Once they're all downloaded, Gangla will automatically kick in,
12:21
combine them together, and start turning it into one single file which you can then look at. So from that point of view, you don't even have to worry about the fact that it was split. You started off with one single analysis script and one single set of data. Gangla split it and merged it. You've ended up with one result. You don't even have to worry about the fact that it was distributed around. It's completely seamless. There is much more than just the root merger.
12:41
You can write any sort of merger you might wish. Anything which posts processes of the data, basically. There's a class in Gangla which lets you pass in a simple function which simply takes the output file directory and you just list through There you could, for example, look at the log file for each output job, grep through it for a single string, and find the average of the numbers or something like that.
13:02
You can do anything you can think of to post-process your data. Once you've been working at CERN for some number of years, you're probably going to have submitted several thousand of these jobs over your lifetime. Many of which you're going to have deleted because maybe they broke, but many of which you're going to want to keep around for log files to check that stuff's working how it used to,
13:23
make sure your data's being reproducible. So, Gangla provides a persistent database of all the jobs you've ever run through that system. So, you can see here the three jobs that we've submitted so far. The first one we just ran locally. That's all finished. It's showing up there as completed. The last two, because they were sent off the grid, they've been distributed around and they're still running.
13:41
You see here, each of them, Gangla has created 324 sub-jobs. That's how many it decided to split it into. You don't worry about the number too much. You just have to know that they're there. We don't have any more details about which of them are running, which of the sub-jobs are finished, or anything like that here. This is just a very high-level overview. But it's very possible to get that information.
14:01
Because Gangla provides a full API access to everything that's inside the Gangla's API. So, inside the Python interface, you can access any of the information. You can access job information. You can resubmit things. You can do anything you want. So, at the most simple, we call that jobs object again, like we had in the last slide. We give it a parameter. We ask for job number two, which, as you see, is the bottom one here,
14:21
the merger job, which is, as far as we're concerned, overall, still running. We ask for its status, and again, it tells us it's running. It's the same information. We can delve in a little bit deeper, though. We can ask that job for a list of all its sub-jobs. So, we just give it the dot sub-jobs parameter. That's going to give us a list of jobs. We can loop through each of those sub-jobs and ask each of them what their status is.
14:42
We get a list of all the ones that are completed, we find the length of it, and we find that 24 of those 324 sub-jobs so far have finished. If we waited half an hour and ran it again, it would be a higher number because ganglers keep constantly keeping track of how many sub-jobs have finished. But jobs don't always just be running or have finished. Quite often, you'll get
15:01
random failures. On the grid, your data will be sent to run at some particular site. It could fail without any real reason. Maybe there's an out-of-memory error on that particular location, or things like that. So, as long as some of your jobs have passed, there's a good chance that those that failed, it's simply a transient failure. So, you can loop through all the sub-jobs once more, check if the status has failed on that particular sub-job,
15:22
and resubmit it, and it will go back into the monitoring loop and keep going around, and eventually it will be re-downloaded once it's finished. And this is the sort of thing you might want to do quite regularly. You might want to have a function defined which loops over a job object, checks all the sub-jobs, resubmits the failed ones. So, you can take any bit of Ganga code, stick it inside a function inside a
15:41
.Ganga.py file in your home directory, and all those functions will automatically be available inside the user interface, which is based on IPython. It's a slight fork of IPython to provide this sort of functionality. So, the last thing I want to talk about is dealing with very, very, very large files. So, the example I gave at the beginning
16:01
I was saying you might have a directory on your computer which has got something like, you know, a thousand files in it or something like that. Even if each of those is only some number of megabytes running up with gigabytes of data. And in fact, quite often, when you're doing data analysis with the LHC, you're going to be dealing with at least gigabytes if not terabytes of data you're going to be
16:20
wanting to run your analysis over. So, it's nice not to have to keep those files locally on your local computer and upload them every single time you want to have to do an analysis over them. And then at the end, if the output's big, you don't want to always have to download the output. Maybe you just want a summary file. Maybe you just want to find the number of events that pass some sort of criteria.
16:41
So, as well as being a distributed compute network, the grid is also a distributed file system, or at least it provides a number of distributed file systems. The one in particular here is using this Dirac file system, which is again, originally an LHC-B specific grid interface. But the important point here is that it deals with a remote distributed file system. You don't have to worry
17:01
about where these files are. They're out there in a way in the cloud. So, for the input files here, we tell Ganga that we want as our input file a file called input.root and we're saying Dirac knows where it is. So, I don't know the exact physical location, but the file catalog knows where to find it. For the output file, my
17:21
program is going to create a file called histogram.root. That's going to be made locally on the worker node wherever my job is run. And I don't want that copied back to my computer here. I want you to send it off to the remote storage. That will keep track of where it is. That will keep a record. I can access it later if I want to. But for now, I don't want to be dealing with all that network traffic coming up and down.
17:43
And in fact, it can even be a little bit cleverer than that. Using the Dirac backend, which is basically a layer on top of the LCG backend, it's got a bit of extra logic in there to deal with this sort of file system access and so on. One of the clever things that can do is all you have to say to this
18:01
with this exact script here, you upload that script you submit it, Dirac will automatically take your analyzed data like program you want to run. It will look around, find the physical location where input.root is stored and it will send the job to that site and it will run it there locally rather than submitting that analysis script somewhere and copying
18:21
the files over. It will try and automatically reduce the amount of copying that's going on in order to make things as efficient as possible and avoid clocking up network bandwidth. In the same way, the output is going to be stuck somewhere and so you could then run a second job. You could chain together jobs. You can say, this is the output of job one. I want that same output to be the input
18:41
of job two. You just have to pass in input files equals Dirac file histogram.root That job will be submitted to the grid. It will go up, it will look around find out where histogram.root was saved to and again it will be sent and run there. You never have to deal with those files on your local computer at all. You let the experiment or CERN have to deal with all that storage and
19:01
file management. So yes, using the grid like this you can deal with hugely large files without having to deal with them. Of course for each sub job you'll get back a standard out file. You'll get back a standard error file so you can make
19:22
sure your jobs are running correctly. You can always have some files being sent off to Dirac, some downloaded locally, some sent off to some mass storage place. You can have as many input and output files as you want coming from whichever source you want as long as Ganga has an interface for it. And Ganga being extensible, you could very easily write a new plugin which dealt with any other file system type that you might
19:42
want to use. We do have a file system type which uploads things to Google Drive for example. Quite often people just want to be able to share files through Google Drive and so you can access, upload and download files from there. So you can write basically an interface to any infrastructure you might want to be using yourself. So you can find out more information at the website cern.ch
20:02
slash Ganga. Like I said all the code is completely open source so you can go to the download link and have a poke around with the source code. It was starting, the project was started in 2001 which for reference is about the time that Python 2.0 came out so some of the code is quite has been around quite a while but on the whole it's quite readable and you can see what's
20:21
going on as far as the job flow goes. So take a look at that if you want to have a little poke around and thank you. Questions?
20:42
Thanks for the nice talk and the nice tool. I have two questions. So the first question is can you target some schedulers such as Slurm using the library? I don't know if there's a Slurm back end yet but there's ones for Condor and talk and so on so there could easily be one for Slurm. I mean it's a simple case of writing a bit of code
21:02
to call the right commands at the back end so Slurm could absolutely be interfaced with if necessary. Right and my second question was I forgot the second question. Okay I'll ask you in the back. And we got another question over here.
21:23
Thank you. There was on your merge slide. Can you go back there? I don't understand the line with the J dot input files where there's actually a list comprehension
21:41
on... So in this case input dot cxt is indexed. Is the open overloaded somehow or you get a file handle? In this case input dot txt contains a list of file names. So that's an index file containing a list of all the file names that you want to include as your input for your job.
22:01
So you're going to loop over each of the lines in that file. Each of which is a string which is the name of the file. But that's not what open usually does isn't it? So you just have a file handle. If you have open dot read lines so you get of course the lines or whatever. When it's looped over in this list comprehension it does produce a list of the files.
22:22
It has a slash n at the end of each one. But it does work. I did check this line. I wanted to know how you handle code that runs parallely and needs to communicate to other processes
22:43
or in different computers? So inter-process communication between analysis jobs and things. Or inter-network communication. On the whole there's very little scope for communication. I mean Ganga is blind to that if you submit to a supercomputer which has
23:02
got some inter-process communication that you need to do or some sort of communication of any kind. It will handle that because Ganga doesn't care about it. Generally jobs on the grid you don't have any sort of communication between them. Each job is siloed very much so. So I suppose you don't submit jobs that need to be run across multiple
23:22
processes. Mostly not. Not in the sort of work that we do. How does Ganga find files? I mean you can't just use a name right? So each of the local file or direct file for example
23:42
he's got a little bit of logic in there. So local file by default will look in the working directory that the user is in. Direct file will so you know obviously you can't just say input.root it's going to magically know where it is. By default it's authenticating with the direct system. So each person's got a local user area. So it will look in their local
24:01
user area for the file. And likewise it will be saved to their user area on the file system. Yes yes you can overwrite files from a previous job and things like that. You can give multiple output directories and stuff yes. Okay thank you again Matt.