Add to Watchlist

Ganga: an interface to the LHC computing grid


Citation of segment
Embed Code
Purchasing a DVD Cite video

Formal Metadata

Title Ganga: an interface to the LHC computing grid
Title of Series EuroPython 2014
Part Number 71
Number of Parts 120
Author Williams, Matt
License CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
DOI 10.5446/20004
Publisher EuroPython
Release Date 2014
Language English
Production Place Berlin

Content Metadata

Subject Area Computer Science
Abstract Matt Williams - Ganga: an interface to the LHC computing grid Ganga is a tool, designed and used by the large particle physics experiments at CERN. Written in pure Python, it delivers a clean, usable interface to allow thousands of physicists to interact with the huge computing resources available to them. ----- [Ganga] is a tool, designed and used by the large particle physics experiments at CERN. Written in pure Python, it delivers a clean, usable interface to allow thousands of physicists to interact with the huge computing resources available to them. It provides a single platform with which data analysis tasks can be run on anything from a local machine to being distributed seamlessly to computing centres around the world. The talk will cover the problems faced by physicists when dealing with the computer infrastructure and how Ganga helps to solve this problem. It will focus on how Python has helped create such a tool through its advanced features such as metaclasses and integration into IPython.
Keywords EuroPython Conference
EP 2014
EuroPython 2014

Related Material

speakers that Matt Williams sort of Matt Williams but he has worked on the computer infrastructure of certain and he's going to talk about the LHC Computing Grid works so please give a warm round of applause from that thank you so I'm not inside recently finished my phd impossible was a close working on the LHC the experiments on the LHC and 4 years recently graduated and are now working
University Birmingham working on computing resources for scientists to their themselves doing the analysis now and and this is part of that work the time helping to uh have developed this tool Ganga which is an interface used by scientists to interface with a huge amount of computing power and storage available to them as part of the LHC Computing Grid
so the flow of data in case anyone here doesn't know anything about some of the LHC and it's evolves largest part of the experiment police both sides man made 1 of them is arguably the world's largest man-made structure as well as being a 27 kilometer long range under meets underground in a tunnel dug specifically for the purpose so it's a proton collider so it's accelerating from in its speed of light and find them together but for applications around the ring i each of those as a detector which studies the at the outputs of these collisions and analyze the data this gives them given the huge amount of collisions are happening every 2nd billions and billions of happening um it out in huge amount of data coming amount of data that we proposed producing is way beyond what we would actually be able to to collect and but separate you to date equals something like 200 petabytes and there already it's probably behind that and it's only gonna go only going to grow as the accelerator get more powerful in the future so that was to be able to process the huge amount of data
along alongside the design of the LHC was a a corresponding product project called the greater the idea this was to produce a a computing environment which would be about handle the large amounts of data and and uh and processing power that would be required so works that systems so there's a central hoppity 0 this site which has a large amount of computing power and they then far down to a single site in each country that's involved in the LHC is about 12 30 of those t 1 sites spread around the world 1 in each country is involved and the level below that at 2 sites is about 160 events each of those is generally it's something like a university or research institutes there will be a dozen or so each country for example some countries more some countries have less and the teachers and the 1 with the largest modest amount of data processing is done and the sort of data that we study at at the LHC in this analysis we do really does lend itself sort of distributed nature you tend to end up if you're doing analysis with a list of collision events may got 10 million 100 million events you want to that you can very easily take a small chunk of those and process them independently of any other chunk
of data there's no real interaction between the events so you can very easily change top send out to where we need to go and collect results at the end and so as I say the
project was the book the project was evolved alongside the election so even in the early days well before yet started people were looking into doing these computing systems provide services to the scientists needed so in 2001 the IAC board the project started walking down this was their in-house specific uh interface to this grid infrastructure into the other experiments also working memory personal projects in which interface does everyone was convinced that they had their own special problem only they could solve in way that they needed a new done and however reality project anger was designed using a Python system with the explicit goal of being portable and extensible and so on and so is very easy in the intermediate years to take part in the LHC be specific and move them and all other experiments on the LHC to plug in decimal part of an experiment specific uh logic is needed to the Alice Experiment there's a number of years and a scientist analysis experiment you using down before doing the data analysis and in fact outside of that how we have systems well there's that he's UK experiment which is the neutrino experiment in Japan and some of the scientists to using the again using Ganga interfacing with a good
resources which are provided to them as well as of course all the software that we create external force I know what software is completely open source again result GPL and the vast software that comes after so is GPL or other more liberal licenses um so how is actually work so if a
scientist has a lot of what they want to on they can use this for granted to
interface with the grid system or in fact not just of system that interface with any other system that can have an interface to so in this case here you see on the 2nd last line recessing the back-end to be equal to local that's telling the began system that 1 on the committee just 1 here 1 she that something is often done by scientists when your testing because if you just listen you UPC analysis software you don't want to meet the following up onto the good infrastructure 1 at 10 thousand times come across shouldn't be 2nd because somebody competent so it's a good idea to test on small set and then later on the absence of the great so also announced a project the top you can set functions on it was set in the main volatile and to give a string which you keeping keeping track of what you want to use for what this is all the job information is stored into a persistent and database where you can see at a later on the lower cost of the job system behind the scenes is the applications application is what is actually going to be 1 where this thing is going to be run in most cases you just want 1 excuse for it next week binary or it can be a part of critical in this case is just a small shows that so you just say it again this is the thing I want to 1 is the actual come this can happen and this is where you can find it in the following in this case descriptors can affect fall out expect and so it and then the out false installed these are the ones that are going to be made by its these are the ones we want to make sure end up back where we are and now we want to make sure we got a copy of those canonical proper directory whatever the job was actually 1 without follows the originally created and so we specify this of the fall in output files that follow means copy it back to rely on the once we've set up our knowledge of object we just call submits about pointing and subsystems come into play the monitoring group comes in thought to missing the job to the system in this case is just the start up a little notions and some of your future but if you at the grid be operating at grid somewhere in there keep track of the status and interest on and output files at the end the job so once it has finished but
you can just access the output of it the inside the IPython based on gender uh that user interface so you can just call the pig method of model workers had anybody does analysis of the output directory is illustrated all stand out some of them and most importantly the output we asked to give and if you want to be tried incite you composite name this follows the analytic pages directly inside of IPython and you can have a scan through and with a gap of always make sure that thing work the way you want to and obviously that was just a toy example that's nothing more here knew there than simply running a little script from the computer when you can have a file so the goods to be able to leverage the power of the grid and it's as simple as changing the
back end on that last step from local delta well CG stands for the LHC Computing Grid it's the the acronym that we use for that so we what 1 change of 1 line to the other and you could 1 exact same script and that code be published to the grid system that the system takeover distributed Monaco River ends up on you don't even want to be in China could be the
mother be answered and could be balance and it could be anywhere and I seen the user at the end that they will be covered back and everything is the same the body about it and but is more than just that more than just like the 1 in stuff and the great interface with anything that you can access file API basically so there's a series of actions for the PBS NSF there's about systems often universities of got local about systems about from some kind of ceasefire wanting a just to somewhere between 1 and computer and you want to update the grid and again you just change to PBS and will be submitted to your article father and you need not binding details and these losses 1 here most of the sets all of the experiments specific backends so various experiments to go the middleware interfaces sitting between dangling agreed to make an e-mail type situation and to avoid that the features that maybe like them to the needs and but yet again it's is all a black box is concerned you have to worry about what's going on it's just going to work
similarly using agreed but it's really make use of the huge amounts of how he provides an so let's say for example you have sitting on your local hard disk directory containing a holder of our maybe brought 3 thousand files or something in the east some number of megabytes yeah adding up to a gigabyte let's say they still something like that does a lot of data you don't want to analyze you couldn't tell any that is the input files you want over from that point on and will keep track of follicle make sure they get copied to wherever the job once square that's absolutely on your back system or if it is corporate out the Great Dome actually solves end up with to be of course and it would be pretty useless because you taking 1 institution follows copying to 1 place on the grid and they would just be born in 1 single computer node somewhere in the goods to be able to distribute around make sure we're running things in column and then there was a tool instance this is so again defined on the door objects that's a promise of In this case we can use this be what follows objects this is a an object which knows how to speak that follows up into a smaller set of data and we simply take from violence that followed jobs so it's going to take this list of how many thousand follows you have changed what into chunks of 10 or maybe that's all that's not enough to fill a chunk and taking into this text as in the all the analyzed data description beyond along that submitted and that that operates somewhere and take the next 10 will ought not be sent off somewhere and we keep doing that be from the list and you end up with some number of hundreds of jobs and again we keep track of you to keep to keep track for you so you won't have to worry about how many symptoms made doing it manually is completely automated at the end of each of those some jobs is going to create a histogram of is a uh focal might use certain it's basically a table of data as follows the substance and the process continues to GM and so on basically a table of data by specifying the file here this is saying that finds don't immediately is being made for every job is 1 you don't care about it but you also to copy it back to the future so you can have a look to the blocking of access to all analysis software whenever you will and going to use to analyze the data but as ideal even then because you're going to end up with however many hundred copies all the variants of this histogram don't go get put in a subdirectory should still they're going to be so far out of your going has been the manually and so so that problem that something
we're merger waited for the whole suite of mergers but 1 in particular here is the so this is a little bit of Python code which understands how to concatenate together the falls in its have to stick them together with a combined into 1 single file and this again is completely automated once the joint updated in space out since how with the world you can download all of the results from each of single subjects 1 civil downloaded with automatic that kind of automatically chicken combined together and stop turning into 1 single file which you can so from that point if
you don't even have to worry about 2 steps we started off with 1 single analysis script and 1 single set of data then specific and merged and even that 1 was only going to worry about the fact that it was distributed around its competing and that is but much more than just a reminder you can like any sort emerging what which any anything which proves processes the data basically um that there is a class in Dyna which that's that's pass simple functions which simply takes the outperform directory pages list through then you could you could for example you could look for all of which have job referencing strain and find the average of the numbers or something like that you anything you can think of to say to post-process data so once you've
been working at CERN for
some number of years Republic enters into 7 thousand of these jobs over your lifetime many regions venture deleted because maybe they broke the many of which you can want keep around the log files to check that working used to make sure you're dating reproducible so again defines a system database of all the jobs you ever want to that system and so you can see here the free jobs in these images so far the first one we define the claim that some finished showing up there is computed the lost because there wasn't a great they around minus the warning you see here each them kind and treated the and and 24 jobs that's how many in the slightest bit into you don't worry about them too much you just have to know that um we don't have any more details about which the my learning which is the sub jobs finished or anything like that this is just a very high level but is very possible to get the information because then divides a full API access to everything that's inside the tank is API so
inside the white interface you can access any information you can access job information you can resubmit things you can do anything you want so most simple because that job what jobs object again but we have a lot like we the product we also job number 2 which sees the bottom
1 here the merger job which is as follows we're concerned over still 1 and we
also they just again it was as if it was based on the same information we deeper than we can also the fact that that job for a list of all the subjects and we just give Bob subject factor that's going to give us a list of jobs we can use region there was some jobs and asking to then what i faces we get a smaller 1 of the computers we find that that we find at 24 of 324 sub jobs so far have finished if we waited half an hour then again behind because Congress keep Consonni keeping track of how many atoms of jobs finished but jobs they always just be running or finished quite often you get random failures on the original data will be sent to 1 at some particular site it could let it fail without any real reason maybe there's a lot of memory error will not particularly patient was things like that so long as some of the jobs the past is a good chance that there was the fire wood is simply a transient failure so you can do this through all the subgroups once more check the epistasis ferroelectric assumption and resubmitted I will go back and someone to move and keep going around and eventually will be redirected once it's finished uh and this is something you might want to do quite regularly you might want to have uh a function defined which leads to age of object sections of jobs we submit the pharaoh ones so you can take any dividend code stick inside a function inside dock adopt he 1 file in your home directory and all functions automatically be available inside the user interface which is based on like but it's a slightly provided by Python divide this into the functionality so the last thing
also about what is studied in a
very very very large files so the example I gave at the beginning I was saying you might have a directory on your computer which has got something like a thousand flowers it or something like that and that is if the even if you to those only some moment of widening of the converted data and in fact quite often in doing the analysis and the the energy you going to dealing with these gigabytes of terabytes of data you can't be 1 wanting to run your analysis over so might not to have to keep those files locally on the computer and lose them every single time you won't have to do analysis of the and then at the end of the output speech he didn't always be or maybe just what some following just want to find a number of events the past some sort of criteria so as well as being a distributed computing network degrees also a distributed file system always adviser number of distributed file systems 1 in particular here is using this direct foster which is again a Virginian LHC be specific about that interface but important point here is that it deals with a remote distributed foster you have to worry about where the sum all all that in a way in the cloud um so follows here withheld danger that we won't have time to a fall in of it and saying do I know that has signed by not and the exact physical location but the focus of this work to find it but the output file what might like my program is going to create a file called histogram government that's going to be made lately 1 working and where much of this 1 I don't like without too much to do here I want you to say no to the storage at that keep track of where it is that you record Texas later for before now I don't want to be dealing with that network traffic coming up and down on and in fact it can even be a little bit of that and using the DOS backend which is basically a layer on top of the back and it's got a bit of extra logic and have to deal with this sort of file system access and so on and whatever things that can do is for you have seen this suggest that you upload that's would
automatically take you analyze data like program you want to run it would've around find the physical location where the input of distorted and also the job to that site and a 1 in there legally rather than submitting that analysis script somewhere and copying follows a it will find automatically reduce the amount of something is going on in order to make things as efficient as possible and avoid colony of network and In the same way that the output is going to be stuck somewhere and and so you could then won a 2nd job you could change together jobs you can say this is the output of job 1 I will not same outputs the inverse of job to and you just have to policy input files you use your faults to handle at actual will be submitted the greater god around find out which incidentally was safety and again it will be sent and on that you never have to deal with follows when you look at computer I told you that experiments also has to deal with storage and file management the on so yesterday's in the grid like this you can you can never do with is usually large file that we have handle them of course which subject to get back a standard out following get back a standard error false can make sure your jobs 1 incorrectly you can always have some form being the direct sum them admittedly some sense of some of those places you can have as many men as follows you want coming from whatever source you want as illegal going interface for it and I'm going to be extensible you could very easily write a new plug in which dealt with any other fast and find the weapons used we fast and tight which of those things to understand life for example quite often people just want you to share files and drive and so you can access of endowment false from that so you can like the interface to any infrastructure you might want to using yourself and so you can find out more information about the website and
each can and what they said the who's given so so you can go down the Lincoln House programs source code and it was thought that that was not in 2001 which further uh references about atomic parts and 2 . 0 came out so some of the code is quite so it has been around for a while but on the whole is quite variable and you can see what's going on as far as the job and so take a look at i wanna have a look around and thank what you all question that's
what this book and so as to questions so the 1st question is can you public some scourges such as were using the was a real but yet I like I don't know if this is some back-end yet but there is once the condo and talk and so on so they could they could easily be 117 is simple case of what they have to call the White commands of and so we have to learn that absolutely be interfaced with fitness necessary rights and the 1st 2nd question will prove that OK I just don't like it and we get the question over here thank you there was on you much slower lot but and I think
they don't understand each of the of the line with the j thought people for work on this actually in a list comprehension and so in this case in the stores that also all although overloaded some or you get a file handle and in this place in facilitates the contains a list of filenames so that's an index file containing list of phonemes that you want to include as you into your job I'm seed interleaver over each of the these lines in a file into which the string which is the name of a file but that's not what what open usually you don't see so you just just have the problem that you have opened in both its lines of so you that of course the lines will when is so here in this this convention it does produce a list all the files the new has such an individual but it does work I died detectors line I always had a the thing
that I wanted to not only 100 . 9 and that transparently and needs to communicate to but the processes like that it different was only in interprocess communication between analysis jobs and causing the the network communication and then on the whole there is very little scope for communication i mean dango will is blind that if you submit to a supercomputer which has got some interprocess communication we need to do or some sort of communication of any kind it will handle that this kind of character Jimmy Johnson agreed you don't have any sort of communication between them each jobs silent very much so so I suppose you don't summer jobs that do this and that need to be addressed engrossment processes mostly not knowing that knowing that sort of what we did hold is kind of find files and the seasoning rights and so that each of the 4 years that followed the fall for example is going to be of logic and absolute fall waterfall default we look in the working directory that users in the right file will so you're part of the conversation but only with a
much newer is by default as is often the case in the direct system so each person's going to use the area so can there there we use various files and likewise it we say to the use of area on the file system OK yes yes you can know what follows the this job and things like that you can give multiple directories and stuff OK thank you again that
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation


  434 ms - page object


AV-Portal 3.8.0 (dec2fe8b0ce2e718d55d6f23ab68f0b2424a1f3f)