Merken
HPC Node Performance and Power Simulation with Sniper
Automatisierte Medienanalyse
Diese automatischen Videoanalysen setzt das TIBAVPortal ein:
Szenenerkennung — Shot Boundary Detection segmentiert das Video anhand von Bildmerkmalen. Ein daraus erzeugtes visuelles Inhaltsverzeichnis gibt einen schnellen Überblick über den Inhalt des Videos und bietet einen zielgenauen Zugriff.
Texterkennung – Intelligent Character Recognition erfasst, indexiert und macht geschriebene Sprache (zum Beispiel Text auf Folien) durchsuchbar.
Spracherkennung – Speech to Text notiert die gesprochene Sprache im Video in Form eines Transkripts, das durchsuchbar ist.
Bilderkennung – Visual Concept Detection indexiert das Bewegtbild mit fachspezifischen und fächerübergreifenden visuellen Konzepten (zum Beispiel Landschaft, Fassadendetail, technische Zeichnung, Computeranimation oder Vorlesung).
Verschlagwortung – Named Entity Recognition beschreibt die einzelnen Videosegmente mit semantisch verknüpften Sachbegriffen. Synonyme oder Unterbegriffe von eingegebenen Suchbegriffen können dadurch automatisch mitgesucht werden, was die Treffermenge erweitert.
Erkannte Entitäten
Sprachtranskript
00:00
yes so so he had to what we about the and so on a PC student now the university and this is the research work and requires many of course is the visualization tool that is so low that it from the total previously visited faster because it was in the ice that can most of human annotated dive into a single global review performance of course and multicore systems it is a common and would the to to the
00:46
case so what are the major tools most celebrities is article that we developed at the University and smaller is the main goal for all the students figure out what the performance of my next generations say this museum by coming out of about life of the child was the settings on the high seas workload is going to it on and cost of the text in the on another thing you can do is to hardware software codesign so this is something that we're really lucky because you have to be aware of Intel and the idea there was and we change the software can change the part where the same time and now we can do better than she'd you only 1 with another along so will go the orbit of the but 1 of the things that think is 1 of the parties they come from this was my application form and this so that means that the the last part but the sense that it is very difficult the talks with 1 of the point is as well as I that this so we're going to go into the shopping so working with
02:10
our research group working it try to design a mild processes using the of the the thing to talk about optimizing to model of and simulation and that there is there is 1 of the the so now we can use the detailed analysis of the application of our hardware and we can see how it interacts with the you work region and the early so observation of the of the average the command
02:44
nonstandard simulations the things that I just used also centers on there there there is the great red white and I just use the best and see how the rest of this year right well it turns out that using these methods give you really good resource for optimization but difficult to a hardware software optimization print your when your time of performance and the problem is that not all cache misses like so this is basically computer architecture where 1 here where sometimes you have only loads and sometimes the status of a not very important and so the effect of all the time modern and reporters the overlap is the really know which ones which means not and both of them or performance and the cast of that's so just because you have test simulator doesn't he mean that will understand how the performance was actually you know it was so that's why the results so no
03:51
complexity is also so we have a large changes happening is a lot of research the fibers that with careful here amount of time and now we want to optimize the efficiency of
04:07
so what what trying we notice that we have here are actually you at the meeting these numbers of wars per node increase so that is a 2001 was the 1st move parts of or life here and the 1st x 86 to move forward with that 5 then I
04:27
guess for 2011 so 10 processes and now we have 60 + with nice water until just recently announced the landing processes and my guess would be that you don't believe more was that what was on the 1st version of the but we also many different
04:50
architecture options so this is a typical processing of the processor configuration in scene multiprocessor notice were sort node and each each stock has 4 wars and sharing of of this is a typical configuration here so but what we also see things
05:11
like this which are much different from the typical process this is that the price another thing that we see that there no
05:22
1 out for a lot of this huge assistance and being diverse so we have various processes at the very heart of architectures we also of your a lot about the only reason we talked about that In this talk we use something that simulators allow you better than thank you for 1 hour because of new facts on that so I will talk about human you much for basically you know you have memory that doesn't always just look like it's the same distance so basically you have summary axis is here with a much longer so now we need are in the solutions will fall within the harvest of and what we do our work originated from analytical models so what this means is understanding your harbor and we have a formula that represents the performance of the but that doesn't allow us to the current state of the art of models that provides a level of detail that we were too a complex applications to complex so we
06:38
propose a fast parallel simulation of the book and I mentioned Prince so far but also today and I just focus on solver optimization so it's of cycleaccurate simulations to solve for exploring this on the so there are 2 types of all the future of the simulation simulation means any relation to the pretending that were in different machine has been made in US cycle very precise on getting the extraction of a little bit you use some approximation to the very nature result of size of so what we did is they want to raise the level of abstraction to give very similar results very accurate results but not in that way OK so here's cyber uses
07:28
a hybrid simulation runs were carols and so we have all of its analytical based on models about what we are used to be that begins horrible also indicates the majority of the and you're really would in his the instrumentation tool what you can do is then you there sort of wrapped around your application in the galaxy system so what started as in at 1 time and again this is about the vision of the new 1 we scale the number of words and you can download it right now that's that's how you got lots
08:09
of fun features and support young I support system for this or parallel and we're going to have more things you might see yes that's in the visualization of earlier I do something here over the technical and of
08:28
OK I have to say center is perfect but not for everybody where user level so this might not be the best match work with significant OS involved that means database so that means rest this this is a good not good if you're trying to simulate a web server and of the over this we use a higher fraction of this means that if you want to understand the details of a process that probably don't want users would what most people here 1 of the applications that most of the work for you where 86 only what it turned out that all of these limitations are OK for these diseases so that's why the most I heard about it's so the
09:20
history of science we really start 1st version 2011 and leave the main reviews since then a lot of features and we got the process and the class that loads of researchers so people are searching for sure looking games so that good this is 1 of the
09:42
OK so that was a about the main feature of what what I mean is this community of sniper that's visualization understanding that providing so when
09:56
that we work closely with the master student years and the question we were the 1st question you because of this the analogy difference that is very interesting with these things number I'm sorry we're getting this is something about this so so what we did was
10:17
the very 1st type of visualization introduced the whole sites that was the last and this problem is modern out of water by a is are very difficult to understand the error the losing sight of why is it slow and standards going on so scientist accuracy gas that is 1 and understand where all CI and so we have we have a different components that represent different reasons why 1 so that based component which really represents the best is the of the of the of and then we have the right predictor and has structured caches of other caches and that of the so this is a really good start this is a single right here and it shows the components you to see there is quite a large component that is of the that the announced action and that the matter of this work that's not that so the next
11:27
step well why don't we look at it all rights so I had a lot of these here on the right but basically you will focus on 1 component which
11:39
is this is by bars right so we see these these for retinal at a very small component regret had very important historical so from over here the red component is also ignorant and another with the same thing in synchronization is that so turns
12:02
out that they had residing on the 1st side so this is really tell you that that in I can access the data quite easily but because of the effects of this sort it is sort of is unable to get the aspects and that's what you want the use
12:22
the other guy using the institutional so for example you can compare different but sets and their scaling over the course of this in order to see how much time getting synchronization versus actually doing things so actually you know
12:41
OK so I wanna do is I wanna go to our current most advanced visualization features but this is just a website that automatic generation at writers and it contains a few different things that what I was surprised when I started as we were I was surprised at how difficult it was to get simplified view we start off with a lot of the graphs and charts and things like that but actually taking these away of what would you to so we have a different components here that the the little and this represents time on the X axis and a CGI a loss of where where I'm on lifeless use low on the left you have some options on the right we have different components of working for the color of the most recent version such that the retina moments means this is also true for yellow is right through the reading is because of the variance of this word has higher and call this synchronization with other people and then you just have the performance of the this system the instructions of the site this is a metric that your was
14:05
also from the library is we got the performance what about energy and this was tested on review of the well always on you if you can integrate that is not in the other and rules of that and it has a tool that allows the use of high violence which is in the lower graph the model and see where your powers of so in this case now we have a sack very similar to the ones that but in this case looking at power far this case the power and where the power is going 32 these dual and it was have another
14:51
interesting feature which I won't rule 1 is excess of this has much better and that is but if we view of the performance of rules so that the the you will see which here on the axis are doing better or worse than the variance of this application of using this but it's possible that the example with same year before you didn't performance because of offchip accesses and you the year with the y axis you have so so 1st we
15:34
have all of the values of the application but we also have a you of this system what was the system that was doing so this is a matter of these are all microarchitecture structures that need details of you so we have a lot of cash the 2 cache the share read here but if you mouse over 1 of these components will show you pay us for line of the activities for them so that you know and you can look at the different components of and the all the applications the so then we have a little more
16:21
experimental research work and the idea of this research is how can we analyze the entire application in a more straight forward the of you I understand what is what's going wrong in so we will prove this will happen on the X axis is time and y axis of those normally you would expect some sort of thing you needed where I more instructions then you just take a linear map that's the 1st thing that happens is there's some outliers and that means that you spend more time in these functions compared all of the movement that we want to spend some time to that the yeah I also want to touch on the line on this very interesting model that was developed to that guy of the data and the support the books and what that is what they came up with a very interesting and this is for high performance computing yeah so we have if we have this of like consists of 2 components the maximum attainable performance that you can achieve on no and we have plotted the basically the section of 2 lines but the memory bandwidth which is this line and the key floating point for performance of your and so we need a 2nd line at its states here and now you functions to the start a bond and the closer we get to the top of the woman or bandwidth number is how close to you and that's all the so now you have an understanding of how well we do is there is room to grow here and now we can use this the yeah of the
18:28
so I want to touch on some of the research that we working on the horizontal optimization
18:34
yeah the main idea is if you have no bias of course for example is there a way to do model for optimization this that I have a better performance results of the various
18:49
say you got lots of options you've got small or that ran slower the words which rejected use that this is a lot that's young by you all those of User yet but basically what I'm saying is there's a large variety there and it's getting even more complicated to understand what we do is we use labor to understand the different classes and it's intermediating trusted and but the rights and
19:19
so so for this problem we did was we will have a central what is the computation does is heuristically the transfer between 2 means so the point of interest to other problems but they wanna go ahead of time a few steps and you want to compute just 1 iteration you wanted you to work through for all finding you that but what that means is the extra data around yeah the block at your computer In order to move time without doing properties of the next 50 years of life was 0 for every time step you have to communicate with their neighbors now we're saying let's not community we need to act computations and what were they doing with the computation because the leverage is dark laptops the researchers also believe that this is accessible was you the last it might so here we have the final in that you would want bed and what we see here is as we increase the over repair and what that means if we have the 2 more redundant computation at some point get the vision of the which means you to last updated and you might be going from his without me that that they have a lot of the performance so basically around to to the sometimes that's a sense that any more but not the they were of it and the
21:07
basically the sunrise became to quantization you can do better there a single optimizing just sort words just book OK and it is 1 of the chapter by
21:20
saying the the result you and that was they have a really easy so you started you can get projected mailing list produced by the and and of the
21:40
theft and the quality relation with the function of the sort of the so the you want show that you like to go that is what we were looking at next is happening so if we notice that there is a lack of continuity in this point was well we really want 1 we the want to do every year I have articles that through analysis hold on any kind of over the top so the problem that we could do it that way it might slow down the simulated wants all the benefits of being a simulator parallel circular i've problems in this so that it's kind of like the possible yes and the right now we have of the 1st of all the things you people in correlating the Ireland was 1 of the questions was that and so on and so with the release of the patients were at the time so you can find the right of the so we have validated we've taken in over our that settings part of the example because that decide and you see the result that there is no I have it here but there is of way I think means if you use different tools helping you align results right but that's very difficult to get this that's I think that's that's a broader problem right because different tools have different ideas and in different so therefore gave the accuracy we didn't do it on yeah so that's how you get accuracy so that the male offline 1 of the the the the that's the end of the year let's the but and
24:17
that be really want right our plans to validate that all of this is just a theory and so that the the people that we have evaluated the degrees of the last thing the is that you of the the the the on node in the knowledge that we use the database that you know what I wanted but what the good
00:00
Visualisierung
tTest
Einfache Genauigkeit
Physikalisches System
Grundraum
Computeranimation
00:45
Punkt
Prozess <Physik>
tTest
Gruppenkeim
Kartesische Koordinaten
Diskrete Simulation
Computeranimation
Systemprogrammierung
Bildschirmmaske
Computerspiel
Software
Mittelwert
Diskrete Simulation
LuenbergerBeobachter
Analysis
Hardware
Hardware
Orbit <Mathematik>
Coprozessor
Beanspruchung
Generator <Informatik>
Software
Menge
Mereologie
Simulation
Zentraleinheit
02:41
Caching
Soundverarbeitung
Softwaretest
Resultante
Architektur <Informatik>
Hardware
Minimierung
Mathematisierung
Güte der Anpassung
Hochdruck
KolmogorovKomplexität
Supercomputer
Komplex <Algebra>
Auswahlverfahren
Computeranimation
Eins
Software
Last
Caching
Diskrete Simulation
Computerarchitektur
Urbild <Mathematik>
Verkehrsinformation
04:05
Knotenmenge
Prozess <Physik>
Computerspiel
Verbandstheorie
Zahlenbereich
Wasserdampftafel
Mereologie
Versionsverwaltung
Zahlenbereich
Computeranimation
Coprozessor
04:47
Demoszene <Programmierung>
Knotenmenge
Architektur <Informatik>
Konfiguration <Informatik>
Prozess <Physik>
Gemeinsamer Speicher
Coprozessor
Computerarchitektur
Konfigurationsraum
QuickSort
Computeranimation
Konfiguration <Informatik>
05:18
Resultante
Bit
Multiplikation
Prozess <Physik>
Minimierung
Natürliche Zahl
Kartesische Koordinaten
Diskrete Simulation
Computeranimation
Ausdruck <Logik>
Übergang
Virtuelle Maschine
Diskrete Simulation
Datentyp
NotepadComputer
Abstand
Strom <Mathematik>
Parallele Schnittstelle
Hardware
Approximation
Cybersex
RaumZeit
Abstraktionsebene
Relativitätstheorie
Dreiecksfreier Graph
Software
Dreiecksfreier Graph
Simulation
Computerarchitektur
PRINCE2
Simulation
Aggregatzustand
07:24
Offene Menge
Multiplikation
Zahlenbereich
Kartesische Koordinaten
Diskrete Simulation
Computeranimation
Maßstab
Interaktives Fernsehen
Diskrete Simulation
Visualisierung
Speicherabzug
Hybridrechner
Maschinelles Sehen
Parallele Schnittstelle
Hardware
Caching
Stichprobennahme
Physikalisches System
Migration <Informatik>
Hybridrechner
QuickSort
Keller <Informatik>
Thread
Scheduling
Simulation
Wort <Informatik>
Visualisierung
Simulation
08:26
Prozess <Physik>
Leistungsbewertung
Klasse <Mathematik>
Versionsverwaltung
Soundverarbeitung
Perfekte Gruppe
Kartesische Koordinaten
Diskrete Simulation
Abstraktionsebene
Supercomputer
ROM <Informatik>
Computeranimation
Übergang
Benutzerbeteiligung
Spieltheorie
Speicherabzug
Inverser Limes
Caching
Bruchrechnung
Architektur <Informatik>
Matching <Graphentheorie>
Stichprobennahme
Globale Optimierung
Thread
Last
Kernmodell <Mengenlehre>
Scheduling
Server
Visualisierung
Simulation
Versionsverwaltung
Manufacturing Execution System
09:41
Subtraktion
Speicherabzug
tTest
Visualisierung
Zahlenbereich
Visualisierung
Simulation
Analogieschluss
Computeranimation
10:14
Relationentheorie
Web Site
Wasserdampftafel
Dreiecksfreier Graph
Computeranimation
Keller <Informatik>
Thread
Rechter Winkel
Caching
Datentyp
Visualisierung
Zusammenhängender Graph
Parallele Schnittstelle
Fehlermeldung
Standardabweichung
11:37
Soundverarbeitung
Thread
Dreiecksfreier Graph
Zusammenhängender Graph
Parallele Schnittstelle
QuickSort
Synchronisierung
Computeranimation
Keller <Informatik>
12:21
Einfügungsdämpfung
Web Site
Momentenproblem
Kartesische Koordinaten
Ungerichteter Graph
Synchronisierung
Computeranimation
Maßstab
Speicherabzug
Visualisierung
Zusammenhängender Graph
Varianz
Zentrische Streckung
Sichtenkonzept
Synchronisierung
Graphiktablett
Dreiecksfreier Graph
Physikalisches System
Konfiguration <Informatik>
Keller <Informatik>
Generator <Informatik>
Spezialrechner
Menge
Rechter Winkel
EinAusgabe
Dynamisches RAM
Wort <Informatik>
Kantenfärbung
Ordnung <Mathematik>
Einfügungsdämpfung
Lesen <Datenverarbeitung>
14:04
Graph
Schlussregel
Kartesische Koordinaten
Computeranimation
Eins
Energiedichte
Funktion <Mathematik>
Speicherabzug
Programmbibliothek
Energiedichte
Interprozesskommunikation
Visualisierung
Varianz
Leistung <Physik>
15:30
Punkt
Mengentheoretische Topologie
Gemeinsamer Speicher
Extrempunkt
Mikroarchitektur
Minimierung
Zahlenbereich
Kartesische Koordinaten
Diskrete Simulation
Information
Computeranimation
Lineare Abbildung
Physikalisches System
Supercomputer
Diskrete Simulation
Plot <Graphische Darstellung>
Zusammenhängender Graph
Datenstruktur
Gerade
Lineares Funktional
Architektur <Informatik>
Physikalisches System
QuickSort
Ausreißer <Statistik>
Festspeicher
Caching
Garbentheorie
Bandmatrix
Visualisierung
Aggregatzustand
18:21
Caching
Resultante
Hierarchische Struktur
Thread
Charakteristisches Polynom
Diskrete Simulation
Minimierung
Minimierung
Speicherabzug
Leistung <Physik>
Simulation
Kontextbezogenes System
Computeranimation
18:47
Subtraktion
Punkt
Klasse <Mathematik>
Minimierung
Regulärer Graph
Iteration
Wärmeübergang
ROM <Informatik>
Kontextbezogenes System
Computeranimation
Arbeit <Physik>
Computerspiel
NotebookComputer
Speicherabzug
Flächeninhalt
Maschinelles Sehen
Architektur <Informatik>
Viereck
Kategorie <Mathematik>
Paarvergleich
pBlock
Konfiguration <Informatik>
Funktion <Mathematik>
Framework <Informatik>
Datenverarbeitungssystem
Energiedichte
Dynamisches RAM
Leistung <Physik>
Wort <Informatik>
Ordnung <Mathematik>
Varietät <Mathematik>
21:04
Resultante
Minimierung
Globale Optimierung
Web Site
MailingListe
Benchmark
Lokalität <Informatik>
QuickSort
Kontextbezogenes System
Computeranimation
Energiedichte
Leistung <Physik>
Wort <Informatik>
EMail
21:38
Resultante
Lineares Funktional
Punkt
Datenhaltung
Relativitätstheorie
Automatische Handlungsplanung
Diskrete Simulation
Supercomputer
Physikalische Theorie
QuickSort
Computeranimation
Knotenmenge
Minimalgrad
Menge
Rechter Winkel
Diskrete Simulation
Mereologie
Speicherabzug
Leistung <Physik>
Simulation
Parallele Schnittstelle
Analytische Fortsetzung
Analysis
Metadaten
Formale Metadaten
Titel  HPC Node Performance and Power Simulation with Sniper 
Serientitel  FOSDEM 2014 
Autor 
Carlson, Trevor

Lizenz 
CCNamensnennung 2.0 Belgien: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. 
DOI  10.5446/32550 
Herausgeber  FOSDEM VZW 
Erscheinungsjahr  2014 
Sprache  Englisch 
Inhaltliche Metadaten
Fachgebiet  Informatik 
Abstract  Sniper is a performance modeling simulator. The goal of Sniper is to provide software developers with an easy way to analyze their applications. We provide both performance and energy/power analysis, as well as advanced visualization support. This talk will cover the basics of how to download Sniper and get started quickly, but more importantly show the benefits that simulating your application can provide. With perfunction, detailed simulation analysis, CPI stacks over time and energy stacks, software developers that would like to optimize their applications can now do so quite easily and with more insight compared to using performance counter metrics typically available on machines today. * Downloading Sniper * Using Sniper * Visualization and Power Overview The intended audience is both HPC and scientific software developers, but is also applicable to software optimization in general. 