GPU Acceleration of a Global Atmospheric Model using Python based Multi-platform

Video thumbnail (Frame 0) Video thumbnail (Frame 1403) Video thumbnail (Frame 2166) Video thumbnail (Frame 4145) Video thumbnail (Frame 5553) Video thumbnail (Frame 7534) Video thumbnail (Frame 9729) Video thumbnail (Frame 11116) Video thumbnail (Frame 12260) Video thumbnail (Frame 14626) Video thumbnail (Frame 15436) Video thumbnail (Frame 16076) Video thumbnail (Frame 18302) Video thumbnail (Frame 19145) Video thumbnail (Frame 20436) Video thumbnail (Frame 21488) Video thumbnail (Frame 22079) Video thumbnail (Frame 24169) Video thumbnail (Frame 24877) Video thumbnail (Frame 26116) Video thumbnail (Frame 27195) Video thumbnail (Frame 28413) Video thumbnail (Frame 30582) Video thumbnail (Frame 33199) Video thumbnail (Frame 36103)
Video in TIB AV-Portal: GPU Acceleration of a Global Atmospheric Model using Python based Multi-platform

Formal Metadata

GPU Acceleration of a Global Atmospheric Model using Python based Multi-platform
Title of Series
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Release Date

Content Metadata

Subject Area
GPU Acceleration of a Global Atmospheric Model using Python based Multi-platform [EuroPython 2017 - Talk - 2017-07-10 - PyCharm Room] [Rimini, Italy] A global atmospheric model play an important role in short-term weather forecasting and long-term climate prediction. The model requires enormous computing resources because the all atmospheric states must be calculated every time step (usually a tens of seconds to several minutes). However, since the most atmospheric models run only on CPU machines, they are not able to use the modern microprocessors with high performance and low power such as NVIDIA GPU and Intel MIC. It often costs a lot to convert codes from one machine to the other machine. Although it can be accelerated on GPU and MIC using OpenMP and OpenACC directives, it is not easy to achieve peak performance. I developed a new Python module named PyMIP (Python based Machine Independent Platform) to integrate C, Fortran, CUDA and OpenCL codes with a simple user interface. The main code includes configuration, flow control, IO and MPI parallel is written by Python. Only hotspots include huge number crunching code are written by compile language as C, Fortran, CUDA and OpenCL. The hotspot codes are compiled and imported using PyMIP in runtime. PyMIP enables that a user can switch machines with simple flag. I am developing a new global atmospheric model based on PyMIP to make it easy to utilize various modern microprocessors. In this presentation, I will introduce PyMIP and show the computational performance result in NVIDIA GPU of the dynamical core of the model developed based on PyMIP
Intel Presentation of a group Vapor barrier Mathematical model Physical law Mereology Mathematical model Pi Befehlsprozessor Software IRIS-T Computing platform Arrow of time Multiplication Local ring Computing platform Physical system Graphics processing unit
Intel Group action Observational study Interior (topology) Content (media) Parallel port Mereology Mathematical model Neuroinformatik Independence (probability theory) Hypermedia Befehlsprozessor Computer hardware Information security Logic gate Graphics processing unit Mathematical model Food energy Domain-specific language Virtual machine Microprocessor Process (computing) Prediction System programming Computing platform Numerical analysis Resultant
Point (geometry) Game controller Ising-Modell Multiplication sign View (database) Execution unit Client (computing) FLOPS Twitter Wave packet Frequency Inference Read-only memory Semiconductor memory Befehlsprozessor Band matrix Operator (mathematics) Core dump Software testing Arithmetic logic unit Form (programming) Social class Graphics processing unit Smith chart Line (geometry) Peer-to-peer Single-precision floating-point format Subject indexing Prediction System programming
Commutative property Group action Codierung <Programmierung> Multiplication sign View (database) Connectivity (graph theory) Compiler Open set Parallel port Mathematical model Latent heat Read-only memory Semiconductor memory Befehlsprozessor Core dump Videoconferencing Cuboid Integrated development environment Graphics processing unit Chi-squared distribution Default (computer science) Pairwise comparison Inheritance (object-oriented programming) Electronic mailing list Heat transfer Code Computer programming Open set Message passing Process (computing) Loop (music) Prediction Motherboard System programming Hill differential equation Table (information)
Intel Multiplication sign Water vapor Parameter (computer programming) Parallel port Mereology Arm Machine code Variable (mathematics) Neuroinformatik Independence (probability theory) Type theory Semiconductor memory Befehlsprozessor Videoconferencing Process (computing) Physical system Graphics processing unit Mass Control flow Variable (mathematics) Formal language Virtual machine Microprocessor Type theory Process (computing) Prediction Compiler System programming Physical system Data structure Sinc function Spacetime Codierung <Programmierung> Connectivity (graph theory) Microprocessor Integrated development environment Configuration space Dataflow Run time (program lifecycle phase) Computer program Code Cartesian coordinate system Component-based software engineering Subject indexing Computer hardware Computing platform Active contour model Library (computing)
Positional notation Prediction Scalar field Multiplication sign Operator (mathematics) Vector space System programming Code Damping System call
Touchscreen Multiplication sign Parameter (computer programming) Mereology System call Perspective (visual) Power (physics) Process (computing) Resource allocation Prediction Computer hardware System programming Resultant Condition number Row (database)
Discrete group Intel Parity (mathematics) Laplace-Operator Multiplication sign Connectivity (graph theory) Infinity System call Wave Prediction Befehlsprozessor Infinite conjugacy class property System programming Bus (computing) Nichtlineares Gleichungssystem Software testing Graphics processing unit
Dynamical system Simulation Mathematical model Physicalism Core dump Content (media) Mereology System call Mathematical model Process (computing) Prediction System programming Physics Process (computing)
Prediction Image resolution System programming Physical system Mathematical model
Theory of relativity Prediction Image resolution Multiplication sign System programming Mass Mathematical model Physical system Mathematical model
Image resolution Physical system Mathematical model
Home page Code Density of states Social class
Moisture Core dump Mass Water vapor Food energy Price index System call Number Prediction Time evolution System programming Videoconferencing Nichtlineares Gleichungssystem Computer worm Momentum Ideal (ethics) Pressure Solvable group
Point (geometry) Spectrum (functional analysis) Multiplication sign Mathematical singularity Orthogonality Division (mathematics) Thermal expansion Domain-specific language Mass Sphere Parallel port Infinity Event horizon Lagrange-Methode Runge's theorem Frequency Dynamical system Polynomial Temporal logic Point (geometry) Gradient Numerical analysis Core dump Rectangle Element (mathematics) Derivation (linguistics) Prediction Basis <Mathematik> Grid Computing GAUSS (software) Order (biology) System programming Quicksort Figurate number Physical system Data structure Spectrum (functional analysis)
Dynamical system Codierung <Programmierung> Multiplication sign Code Core dump Total S.A. Line (geometry) Counting Mereology Protein System call Mathematical model Usability Neuroinformatik Mathematical model Prediction System programming Musical ensemble Loop (music) Resultant
Meta element Intel Presentation of a group Group action Multiplication sign Real-time operating system Client (computing) Mathematical model Area Independence (probability theory) Medical imaging Befehlsprozessor Core dump Abstraction Library (computing) Graphics processing unit Source code Concurrency (computer science) Open source Numbering scheme Open set Formal language Virtual machine Mathematical model Process (computing) Prediction Order (biology) System programming Species Figurate number Data structure Modem Point (geometry) Observational study Codierung <Programmierung> Image resolution Disintegration Mathematical model Power (physics) Latent heat Data structure Loop (music) Mathematical optimization Scale (map) Focus (optics) Projective plane Code Field programmable gate array Computer programming System call Inclusion map Loop (music) Personal digital assistant Mixed reality Computing platform
Axiom of choice Presentation of a group Theory of relativity Multiplication sign Domain-specific language Formal language Number Process (computing) Integrated development environment Personal digital assistant Analogy Energy level Error message Library (computing)
hello everyone I'm going to pursue pies encompassing it all over
you and it's my honor to present his presentation my name is jointly I'm prone Correa used the 2 novel at must be preset systems the title of this presentation is too few acceleration of local law at must be modeled using high some based on what the platform I'm going to talk about almost is to take advantage of all barriers model processes for larger-scale assigned the competitions I have all part my presentation as arrows hostile you introduce some model processes focused on secure and CPU
2nd integers on new small so you pass it provides a more general aim the pioneer each side developed for what platform so that you presented results prime-time needed to all go ahead to must be modern finally I you talk about the Sony entries of plants let's start with the 1st part of there's a
linoleum for you interested in computer hardware there I'll flying appeal processes in the computer market today the nation through traditional secures the mall study presented here although model processes is the work that this processing unit named a GPU developed by and media and I am the the if you saw is not a developed for dedicated group processing but now they are rather similar apropos a contagious domain is the interior to the course and name the might developed by intent was made by integrating all sixties if you cost and that the chief you wanted the In recent years at Tuesday's are regularly by having to improve policy by title poem you no logic gates to Bishop of hospice the numerical algorithms all income is a modern processors use hi conventional upon region high-energy chosen they also necessarily paralyzed descriptor
shows the trend of the content technology over the last 3 or so years convention the control performance is was proportional to the class period of was secure core and the crust Putin had been already read a form was is according to the Wars role for a long time however and the be the train to southern perplexity that had heartily interested to 2 to the power consumption mutations is there and as you can see at the park all the lines the normal cc increasing activity the for inference released into peers hair was thousands of course we do a ball and you have as Clusty of slowly this time this is to last so long time let me
introduce the people's pertain to fuse the presenting their modern processes and traditional appears tossed the poss percentage would produce of all signed up to the 2 D Ising ready imagine units named ALU use very the parliament while if you or OK some tender so to test and fro control and then the A. you want if you or OK most over trend is passed to you therefore from a point of view of the profs index which indicates how many hours Smith operations can be performed a 2nd but you views some much higher than the ship heroes the 2nd is that the memory banned the use of the chief used using the TDD our fight or 8 of you know is is much higher than the ship use using the deity altogether for memories however you know the panic for all advantages of the chief useful for once all highly paralyzed to I was using the client otherwise the post may be laughin disappears the
despair was show the specifications of some ship years antifuse sold on the market resulted these are not very today take a look at the secure and that if you it is there compare times marketing made boxes the memory then this is more than 4 times the you performance and the frosty by more than 8 times using group precision and 5 times in the Christian I did not make a direct comparison the table while I hope that the T policies go-it-alone The Wall Street sent video + const too few if you have if you have a close so far I might elude Nico and began brewing no I'm not but if you just had its own advantages and disadvantages the shot 2 views of the rest of the default data Paracomp high costs and high energy efficiency however you it was is not data parent connotation than if you may be using this for you also assumes that the purity is currently the pseudo Trichet's press pass on the motherboard they the connotation read the horse nearly might become more serious partly OK my
important cash about the model processes is this how can I utilize these various smaller processes in all modern the water quality being developed in my company UCB in for a full turn so Minnesota such as open ACC and open and are considered the more natural as it is in the examples through all the previous propriate detect was people the loop throughout the components is the core of the problem Tiger motion the list of the EU elegant as cruel however in light my personal experience to tick tumors so I use it to start to be what they are difficult to achieve the high once on the
other hand I think that each processor has a major problem in end-user which is designed to support for you for example for the following fuse up 0 4 AM disappears and I ask issue for In Midas since engines sup parallel anti they shouldn't be more benefical index lies in the proposal in the process the so my ques shown has changed the like these is there a way to integrate called by various native languages such as equal but opposite and to see Saul I made a
new small high some more generally in the Piney understand for quite some based the most independent flat though quarrel tiny these 2 seats to the process and the land is easily how improvisers 3 components as a long time I want the this system and to analyze variables that is followed by the pipe with our health is shared and placement Mr. types modem presents a thousand water is used to wrap of full-time C and I you libraries 2 shows those 2 multilayer structural pi me the main called along the application parameters to be tuned by prices the increase the past that have really all picked it to the competition upon was such as computation for all contour pre and post processing and so on the nice competition parts are written by lower lenders it accordance pull tiger processes and Paracomp still rolled over orders and put them as by some more the price of conscience I also made is the or a video was depend on the target process because the teachers and Intel might have dedicated to the memory space and
this is a very simple example of a how to use a it calculates the notation T equals a times x and plus y but so I N operators and ASA scholar tho below is a simple python was using the long time what you now let's come what this call to compiling duties as full-time see for the lineup up this year 1st desire for time
and see was an the
opposite theoretical the wouldn't size follow but that is about us to not have audio perspective on the screen explicitly compared to the paternity call this
finally DC supply so make all the using time needed to integrate the previous fall role in recorders the 2nd part to initialize the data the following sentence so this possible process so and many times the In this example Shapiro and photon despite the next part devised user arise that's in arise always for Tiger processes the next part is to compile a portal conditions defined in the previous similar they're called this Beaupuy's of constant named to every ICT the so rare poll the punishers the next part is to call on the portions the generalized or a arguments at the end of the result is checked really oppose power do so what I want to emphasize here if I change the process of true chip you and says the land it took for them destroying year-long on that's if you without any anymore the other parts this is my goal to use the hardware by simply modifying the options in the I
a simple proposed test was performed using our two-dimensional maybe cation to go to the runaway where attention is defined by the Laplacian operators as visitation discretize dissertation using the center of final people's missiles using dissertation us improve circular where can be selected as shown in the Indian nation I and 4 and so they looked
forward to Mandalay beautician awarded to new time the and Piney based support to us for the up this year is the key respectively the where the bus shows there also pioneered face to put on use about 4 per cent lower than that of pure full-time call why parity best to put our has of good performances on and you get to fuse highly based opposite does not only as a spectator on Intel CPU and you may pilot did not suffer of the decrease in performance compared to pure for time pure and component that is sure so called for policy and you get a few I not as the DeSoto
applying pioneered to the global adolescent more and more than being developed in my company though girl at a
lesser mother is to make the greedy horizontally and politically on Deoras as shown in the pedia and to so it was really itations emotions at each agreed to point it must be moderate consist of a city parts dynamical call physics processes and data simulation I private primary only to the dynamical called DC parts
little complicated the spread of discrimination of the corolla at less the mother it is much more effective true pure musically published by NASA in the United States the I the this will give a short
social relation is a problem G. always pi model developed in NASA all I'm sorry 10 m from the and it some that on the the In the experiment the of the here absolutely time some nothing that
in this was this yet to walk
and thank the but but this is was
yet this creatio socialization is a you aspire this is so moderately wrote to by another it is interesting that class common goal and there are some pipelines then my country code as soon the yeah the global or it was more than sure later at fraud on DOS like these
but who cares if this will be done that's a home page at the who the
go annotations of the banner call consists of them parlaying condemnation mutations from last video indications of highs and turn and put in the speed temple so pressure so entropy and water of papers the the number had to be widely
used as agreed on dealers as shown in the left to see here however the higher the grade duration the logo perature she's so we use the she was pure greed as showing the light here thank that use
period occurrences global rectangular months and their internal cost closer points as shown in the figure the user make mess as a spectrum at Minnesota for those special data what the rest and the sort order event could promises for the time they what he was the the spectrum of the mass of the has Exxon to Paris can scalability and
I counted the called Alliance of the mother being developed in my company the kinase to are modeling total total alliances evolved to 100 the sodium sodium minus sojourn I I and about 100 to 40 posts I wasn't he said for comments and blank lines note that the panic call on this Cobol policy 60 to 70 % although more there a current use of all I was out the 600 the lines each occupies only about 4 per cent of the top 2 lines I attended all part into could I end up this year I call this an integrated the band using highly many scientific competitions as a result more than a tickle lotto computation time in a small portion of the cold that's why I think the mess using pioneered use usable Paul many scientific connotations the all 45 so what is used in the band of recall I complicated diesel teasing took for them and happens you call this or is not really between white people and
desire ensures too much protein the walk for all of the band calling using
this figure compares the was corrupt piled upon our core using pioneered primary by inter you're and B. that's a few and then like respectively the horizontal is the mother is about 100 kilometres it's a very low resolutions the focus times 1 day the world collect time by the 2 circadian territory on securing beachheads assisting cost was 30 hours or so minutes based on dis time then often see called on the termite was about 2 . 2 9 times faster than couldn't called on the 1 on and you get if you you a ball 7 . 2 port times using a tone of study a airports appears to be the NCI and they all 40 . 2 5 . 2 9 and a . 2 0 and 11 times faster respectively the specification so the processes used to in the is experiment are shown in the U. K. with low I think they're using up 0 point might is so bad idea 0 I plan to try I a species that a well see later look that I don't want to say and be that's if you stop there than in secure sprung to seize power I would like to show you that using Piney make this easier to use of various model processes I finish the
presentation I the center found a new project named always you share a similar to pyruvate why prime in the clients that the main problem should be Python all CCA or also that the main core and can also be full-time cities suppressed pose Andrea so you forces that has more advantages than I mean I might change I mitosis I was a from my kind of all due method for any project named the real-time reprise enormous optimizer target quotas for the modeling of processes from 1 should look all these buy room structure and dependencies the nice thing about the loop I is that I cannot provide various optimizations such as a piling move through on only be there anymore applying the should the call if I can combine was to say and power as shown in the figure you to be very great Minnesota summarize I propose stars to utilize the modern process for this case and the connotations I made the new i some order Mendip the I need to Integrated Motor coders so clover and often it also where it'll be used to problem as the global at modern I have to images of plants 1 is top right open was to the to the mother you put you might also want to the other is to rewrite the model using all but for any such as a group I and I process it can promise I need yeah thank you so much for your attention attention
FIL be we not what that is the problem that they see when you have a higher level language which is compiled on low-level language like would be the would of course the trace the when there is an error you have to divide that out of the of these the device which has really as you get the right lane and there's a lot of the reason of this is what might be this system whole but all i so I'll decide understand your attention let me rephrase the time when there is a mistake there problem and you need to divide the produced the generated called out of a desert what we have an easy relation ship between the generated according to region quarter all what what would you use all that today all my presentation I did not on using all metaphor in so I I come what it tho come up data in a time called to for the end of this year and since then I I did not know using that operated by by heparin to use that the all this if yeah the thank you for the talk the dean Anthony on ideas and where we can use this technology outside of scientific domains outside of the thing not skin processing like have you this much many other industries where these might might make sense what would the the the analogy can be used outside of scientific what are all actually the same as those is I see the following question mess of ball in assigned to people in there are ultimately you must as yeah the I see there and after you could list all 42 I want to the number of they know bicycle choice for but after the putations fault Benny assigned people there's in full time this of proviral fish I really did not only 10 out so full-time to courses many scientists is only useful time so I all into it to full full-time and other Norway this year so it's not a question that may be an answer to the question that you just got that on that a science like they see that this use case other than scientific on environment in the that that sense industry where you crunch a lot of data you're where is all a lot of the people the use cases are based on them by all stuff like that and libraries recent libraries dying in deep learning industry an like socialist afraid that are very and the meaning for for this kind of usage so maybe it's a were trying using your library to do that assigns all other than in the scientific industry we OK thanks for your comment yeah so how the the