HPC Node Performance and Power Simulation with Sniper

Video thumbnail (Frame 0) Video thumbnail (Frame 1115) Video thumbnail (Frame 3174) Video thumbnail (Frame 4037) Video thumbnail (Frame 5706) Video thumbnail (Frame 6621) Video thumbnail (Frame 7170) Video thumbnail (Frame 7713) Video thumbnail (Frame 9862) Video thumbnail (Frame 11109) Video thumbnail (Frame 12179) Video thumbnail (Frame 12662) Video thumbnail (Frame 13941) Video thumbnail (Frame 14523) Video thumbnail (Frame 15360) Video thumbnail (Frame 17116) Video thumbnail (Frame 18006) Video thumbnail (Frame 18522) Video thumbnail (Frame 21092) Video thumbnail (Frame 22197) Video thumbnail (Frame 23257) Video thumbnail (Frame 24415) Video thumbnail (Frame 27535) Video thumbnail (Frame 28181) Video thumbnail (Frame 28939) Video thumbnail (Frame 31604) Video thumbnail (Frame 32452) Video thumbnail (Frame 36425)
Video in TIB AV-Portal: HPC Node Performance and Power Simulation with Sniper

Formal Metadata

HPC Node Performance and Power Simulation with Sniper
Title of Series
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
Sniper is a performance modeling simulator. The goal of Sniper is to provide software developers with an easy way to analyze their applications. We provide both performance and energy/power analysis, as well as advanced visualization support. This talk will cover the basics of how to download Sniper and get started quickly, but more importantly show the benefits that simulating your application can provide. With per-function, detailed simulation analysis, CPI stacks over time and energy stacks, software developers that would like to optimize their applications can now do so quite easily and with more insight compared to using performance counter metrics typically available on machines today. * Downloading Sniper * Using Sniper * Visualization and Power Overview The intended audience is both HPC and scientific software developers, but is also applicable to software optimization in general.
Computer animation Visualization (computer graphics) Single-precision floating-point format Universe (mathematics) Student's t-test Physical system
Point (geometry) State observer Simulation Group action Electric generator Multiplication sign Mathematical analysis Set (mathematics) Computer simulation Student's t-test Mereology Cartesian coordinate system Orbit Workload Process (computing) Computer animation Software Personal digital assistant Average Computer hardware Video game Form (programming)
Complex analysis Multiplication sign Structural load Letterpress printing 1 (number) Computer simulation Sound effect Cache (computing) Casting (performing arts) Goodness of fit Mathematics Computer animation Software Computer hardware Software testing Traffic reporting Resultant Mathematical optimization Fiber (mathematics) Computer architecture
Revision control Process (computing) Computer animation Video game Water vapor Lattice (order) Mereology Number
Process (computing) Computer animation Computer configuration Shared memory Configuration space Quicksort Coprocessor Demoscene Computer architecture
Cybersex Simulation Theory of relativity State of matter Virtual machine Computer simulation Bit Parallel computing Distance Cartesian coordinate system Approximation Type theory Process (computing) Computer animation Well-formed formula Personal digital assistant Natural number Energy level Cycle (graph theory) PRINCE2 Resultant Mathematical optimization Abstraction Computer architecture
Simulation Word Computer animation Visualization (computer graphics) Hybrid computer Multiplication sign Computer simulation Quicksort Cartesian coordinate system Machine vision Number Physical system
Server (computing) Matching (graph theory) Structural load Limit (category theory) Cartesian coordinate system Web 2.0 Fraction (mathematics) Revision control Process (computing) Computer animation Energy level Game theory Social class
Computer animation Visualization (computer graphics) Different (Kate Ryan album) Analogy Student's t-test Number
Type theory Standard deviation Component-based software engineering Computer animation Visualization (computer graphics) Website Right angle Water vapor Error message
Component-based software engineering Computer animation Synchronization Sound effect Quicksort
Scaling (geometry) Electric generator Computer-generated imagery Multiplication sign View (database) Moment (mathematics) Graph (mathematics) Set (mathematics) Variance Insertion loss Cartesian coordinate system Graph coloring Word Component-based software engineering Computer animation Visualization (computer graphics) Synchronization Computer configuration Order (biology) Website Right angle Reading (process) Physical system
Graph (mathematics) Computer animation Personal digital assistant 1 (number) Variance Cartesian coordinate system Rule of inference Food energy Library (computing) Power (physics)
Point (geometry) State of matter Multiplication sign Outlier Shared memory Sheaf (mathematics) Computer simulation Maxima and minima Line (geometry) Funktionalanalysis Cartesian coordinate system Mikroarchitektur Supercomputer Number Band matrix Cache (computing) Component-based software engineering Computer animation Semiconductor memory Data structure Quicksort Linear map Physical system
Computer animation Computer simulation Mathematical optimization Resultant
Laptop Point (geometry) Wage labour Variety (linguistics) Block (periodic table) Multiplication sign Computer simulation Heat transfer Machine vision Category of being Word Computer animation Computer configuration Different (Kate Ryan album) Order (biology) Video game Iteration Social class
Word Email Computer animation Electronic mailing list Quicksort Resultant
Point (geometry) Theory of relativity Multiplication sign Mathematical analysis Computer simulation Set (mathematics) Planning Database Funktionalanalysis Parallel computing Mereology Theory Degree (graph theory) Arithmetic mean Computer animation Right angle Quicksort Analytic continuation Resultant
yes so so he had to what we about the and so on a PC student now the university and this is the research work and requires many of course is the visualization tool that is so low that it from the total previously visited faster because it was in the ice that can most of human annotated dive into a single global review performance of course and multicore systems it is a common and would the to to the
case so what are the major tools most celebrities is article that we developed at the University and smaller is the main goal for all the students figure out what the performance of my next generations say this museum by coming out of about life of the child was the settings on the high seas workload is going to it on and cost of the text in the on another thing you can do is to hardware software codesign so this is something that we're really lucky because you have to be aware of Intel and the idea there was and we change the software can change the part where the same time and now we can do better than she'd you only 1 with another along so will go the orbit of the but 1 of the things that think is 1 of the parties they come from this was my application form and this so that means that the the last part but the sense that it is very difficult the talks with 1 of the point is as well as I that this so we're going to go into the shopping so working with
our research group working it try to design a mild processes using the of the the thing to talk about optimizing to model of and simulation and that there is there is 1 of the the so now we can use the detailed analysis of the application of our hardware and we can see how it interacts with the you work region and the early so observation of the of the average the command
non-standard simulations the things that I just used also centers on there there there is the great red white and I just use the best and see how the rest of this year right well it turns out that using these methods give you really good resource for optimization but difficult to a hardware software optimization print your when your time of performance and the problem is that not all cache misses like so this is basically computer architecture where 1 here where sometimes you have only loads and sometimes the status of a not very important and so the effect of all the time modern and reporters the overlap is the really know which ones which means not and both of them or performance and the cast of that's so just because you have test simulator doesn't he mean that will understand how the performance was actually you know it was so that's why the results so no
complexity is also so we have a large changes happening is a lot of research the fibers that with careful here amount of time and now we want to optimize the efficiency of
so what what trying we notice that we have here are actually you at the meeting these numbers of wars per node increase so that is a 2001 was the 1st move parts of or life here and the 1st x 86 to move forward with that 5 then I
guess for 2011 so 10 processes and now we have 60 + with nice water until just recently announced the landing processes and my guess would be that you don't believe more was that what was on the 1st version of the but we also many different
architecture options so this is a typical processing of the processor configuration in scene multiprocessor notice were sort node and each each stock has 4 wars and sharing of of this is a typical configuration here so but what we also see things
like this which are much different from the typical process this is that the price another thing that we see that there no
1 out for a lot of this huge assistance and being diverse so we have various processes at the very heart of architectures we also of your a lot about the only reason we talked about that In this talk we use something that simulators allow you better than thank you for 1 hour because of new facts on that so I will talk about human you much for basically you know you have memory that doesn't always just look like it's the same distance so basically you have summary axis is here with a much longer so now we need are in the solutions will fall within the harvest of and what we do our work originated from analytical models so what this means is understanding your harbor and we have a formula that represents the performance of the but that doesn't allow us to the current state of the art of models that provides a level of detail that we were too a complex applications to complex so we
propose a fast parallel simulation of the book and I mentioned Prince so far but also today and I just focus on solver optimization so it's of cycle-accurate simulations to solve for exploring this on the so there are 2 types of all the future of the simulation simulation means any relation to the pretending that were in different machine has been made in US cycle very precise on getting the extraction of a little bit you use some approximation to the very nature result of size of so what we did is they want to raise the level of abstraction to give very similar results very accurate results but not in that way OK so here's cyber uses
a hybrid simulation runs were carols and so we have all of its analytical based on models about what we are used to be that begins horrible also indicates the majority of the and you're really would in his the instrumentation tool what you can do is then you there sort of wrapped around your application in the galaxy system so what started as in at 1 time and again this is about the vision of the new 1 we scale the number of words and you can download it right now that's that's how you got lots
of fun features and support young I support system for this or parallel and we're going to have more things you might see yes that's in the visualization of earlier I do something here over the technical and of
OK I have to say center is perfect but not for everybody where user level so this might not be the best match work with significant OS involved that means database so that means rest this this is a good not good if you're trying to simulate a web server and of the over this we use a higher fraction of this means that if you want to understand the details of a process that probably don't want users would what most people here 1 of the applications that most of the work for you where 86 only what it turned out that all of these limitations are OK for these diseases so that's why the most I heard about it's so the
history of science we really start 1st version 2011 and leave the main reviews since then a lot of features and we got the process and the class that loads of researchers so people are searching for sure looking games so that good this is 1 of the
OK so that was a about the main feature of what what I mean is this community of sniper that's visualization understanding that providing so when
that we work closely with the master student years and the question we were the 1st question you because of this the analogy difference that is very interesting with these things number I'm sorry we're getting this is something about this so so what we did was
the very 1st type of visualization introduced the whole sites that was the last and this problem is modern out of water by a is are very difficult to understand the error the losing sight of why is it slow and standards going on so scientist accuracy gas that is 1 and understand where all CI and so we have we have a different components that represent different reasons why 1 so that based component which really represents the best is the of the of the of and then we have the right predictor and has structured caches of other caches and that of the so this is a really good start this is a single right here and it shows the components you to see there is quite a large component that is of the that the announced action and that the matter of this work that's not that so the next
step well why don't we look at it all rights so I had a lot of these here on the right but basically you will focus on 1 component which
is this is by bars right so we see these these for retinal at a very small component regret had very important historical so from over here the red component is also ignorant and another with the same thing in synchronization is that so turns
out that they had residing on the 1st side so this is really tell you that that in I can access the data quite easily but because of the effects of this sort it is sort of is unable to get the aspects and that's what you want the use
the other guy using the institutional so for example you can compare different but sets and their scaling over the course of this in order to see how much time getting synchronization versus actually doing things so actually you know
OK so I wanna do is I wanna go to our current most advanced visualization features but this is just a website that automatic generation at writers and it contains a few different things that what I was surprised when I started as we were I was surprised at how difficult it was to get simplified view we start off with a lot of the graphs and charts and things like that but actually taking these away of what would you to so we have a different components here that the the little and this represents time on the X axis and a CGI a loss of where where I'm on lifeless use low on the left you have some options on the right we have different components of working for the color of the most recent version such that the retina moments means this is also true for yellow is right through the reading is because of the variance of this word has higher and call this synchronization with other people and then you just have the performance of the this system the instructions of the site this is a metric that your was
also from the library is we got the performance what about energy and this was tested on review of the well always on you if you can integrate that is not in the other and rules of that and it has a tool that allows the use of high violence which is in the lower graph the model and see where your powers of so in this case now we have a sack very similar to the ones that but in this case looking at power far this case the power and where the power is going 32 these dual and it was have another
interesting feature which I won't rule 1 is excess of this has much better and that is but if we view of the performance of rules so that the the you will see which here on the axis are doing better or worse than the variance of this application of using this but it's possible that the example with same year before you didn't performance because of off-chip accesses and you the year with the y axis you have so so 1st we
have all of the values of the application but we also have a you of this system what was the system that was doing so this is a matter of these are all microarchitecture structures that need details of you so we have a lot of cash the 2 cache the share read here but if you mouse over 1 of these components will show you pay us for line of the activities for them so that you know and you can look at the different components of and the all the applications the so then we have a little more
experimental research work and the idea of this research is how can we analyze the entire application in a more straight forward the of you I understand what is what's going wrong in so we will prove this will happen on the X axis is time and y axis of those normally you would expect some sort of thing you needed where I more instructions then you just take a linear map that's the 1st thing that happens is there's some outliers and that means that you spend more time in these functions compared all of the movement that we want to spend some time to that the yeah I also want to touch on the line on this very interesting model that was developed to that guy of the data and the support the books and what that is what they came up with a very interesting and this is for high performance computing yeah so we have if we have this of like consists of 2 components the maximum attainable performance that you can achieve on no and we have plotted the basically the section of 2 lines but the memory bandwidth which is this line and the key floating point for performance of your and so we need a 2nd line at its states here and now you functions to the start a bond and the closer we get to the top of the woman or bandwidth number is how close to you and that's all the so now you have an understanding of how well we do is there is room to grow here and now we can use this the yeah of the
so I want to touch on some of the research that we working on the horizontal optimization
yeah the main idea is if you have no bias of course for example is there a way to do model for optimization this that I have a better performance results of the various
say you got lots of options you've got small or that ran slower the words which rejected use that this is a lot that's young by you all those of User yet but basically what I'm saying is there's a large variety there and it's getting even more complicated to understand what we do is we use labor to understand the different classes and it's intermediating trusted and but the rights and
so so for this problem we did was we will have a central what is the computation does is heuristically the transfer between 2 means so the point of interest to other problems but they wanna go ahead of time a few steps and you want to compute just 1 iteration you wanted you to work through for all finding you that but what that means is the extra data around yeah the block at your computer In order to move time without doing properties of the next 50 years of life was 0 for every time step you have to communicate with their neighbors now we're saying let's not community we need to act computations and what were they doing with the computation because the leverage is dark laptops the researchers also believe that this is accessible was you the last it might so here we have the final in that you would want bed and what we see here is as we increase the over repair and what that means if we have the 2 more redundant computation at some point get the vision of the which means you to last updated and you might be going from his without me that that they have a lot of the performance so basically around to to the sometimes that's a sense that any more but not the they were of it and the
basically the sunrise became to quantization you can do better there a single optimizing just sort words just book OK and it is 1 of the chapter by
saying the the result you and that was they have a really easy so you started you can get projected mailing list produced by the and and of the
theft and the quality relation with the function of the sort of the so the you want show that you like to go that is what we were looking at next is happening so if we notice that there is a lack of continuity in this point was well we really want 1 we the want to do every year I have articles that through analysis hold on any kind of over the top so the problem that we could do it that way it might slow down the simulated wants all the benefits of being a simulator parallel circular i've problems in this so that it's kind of like the possible yes and the right now we have of the 1st of all the things you people in correlating the Ireland was 1 of the questions was that and so on and so with the release of the patients were at the time so you can find the right of the so we have validated we've taken in over our that settings part of the example because that decide and you see the result that there is no I have it here but there is of way I think means if you use different tools helping you align results right but that's very difficult to get this that's I think that's that's a broader problem right because different tools have different ideas and in different so therefore gave the accuracy we didn't do it on yeah so that's how you get accuracy so that the male offline 1 of the the the the that's the end of the year let's the but and
that be really want right our plans to validate that all of this is just a theory and so that the the people that we have evaluated the degrees of the last thing the is that you of the the the the on node in the knowledge that we use the database that you know what I wanted but what the good