Merken

Python Profiling with Intel® VTune™ Amplifier

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
I'm trial and I am a technical consulting engineer that that for me again at the at Intel and developer predicts division team and we're based in Munich Germany and their duties focus will be on performance analysis of Python applications and the off we have to states of no denial that Biden is getting a lot of traction pillar of important these days and if you look at what our friends at home good
able have published indeed and and has grown in popularity over the last years and in 2016 remains the number 1 most use language and also what is more surprising is that by 10 remains the number 1 programming language in hiring demand so on the it's a great skill to half uh in this decade to be prefer to be proficient in Python and yeah when it comes to performance analysis there are certain fields that are kind of
driving the technologies of the future and technologies that are kind of really important right now and these fields I would say would be mathematics and Data Science and to get my facts straight to get the numbers correct I went to StackOverflow our favorite website where we have problems and overflow
shows me that indeed bite and he's the most use language in the fields of mathematics and Data Science now you may think of uh the mouth does make sense if you model the percentage it doesn't make up to 100 where that's because of those approximately 50 K 8 people who responded to the survey of the shows several languages but most of them just by Biden over 50 % of this so let's quiet the impressive so Martin data science these FIL it's actually drive high-performance computing core HPC and other fields like altercation intelligence machine learning all deep learning and this and realizes that these fields all going to define the future and so we have worked really hard to release distribution of software which we call the
and their distribution for Python and it comes up out of the box to highly optimized sublibraries so duality to develop high-performance applications with Python we made it super easy to use super easy to install of
packages can be easily downloaded from anaconda um all in all even yeah providing the audience and so forth I would distribution of Python comes with a highly optimized libraries like num pi side by side can learn which actually at the base leverages in debt and k l which is short for Math Kernel Library now in itself and Gail if you've nest heard about it a few words about it the and it's mating assembly it's super optimize mathematical routines have been designed to make the most out of the Indian architecture
how many calls you half of the and and make use of the latest instruction set architecture for instance the VAX CVX 512 whatever you have out of the box make the most of the authorization so you don't have to worry about this by using the and distribution for Python now the performance is really important so how do we actually measure performance of about an application and the interview to number 5 the tune amplifier is
a good provide it is a profiler that allows
you to know where all the performance problems in your sulfur it has been developed over many years over 15 years and it's still in development were getting a lot of improvements Dave everyday work our engine is a working holiday and arms over the last 4 years we have worked on it and providing guidance and what is great is that it comes with its own low overhead something technology which is unrivaled know this profile is able to get performance data has been as good as an intelligent amplifier so there are some techniques how we are able to get performance data with low overhead so basically when big brother is watching there is no big impact on the performance of your real applications the we've been television amplifier we are able to get and precise
thank
you for this and we were able to get precise aligned level information but some providers allow you to do that but others you may use it and give the data at the function level so basically you have to kind of guess where the performance if you have a big function but the TrindiKit right to the souls lines way out and there bottlenecks now bottleneck is basically like you know the bottle and the neck this is where the performance is kind of capped and our goal is to find those errors you could and optimize on them In all at the eventually opt and increase the performance of the application and what is also great is that we can not only
analyze the bite and performance but also site language and um if applicable NEC code that occupied and good is calling essentially you can analyze your whole system and get data about just and not just that Biden band and binary and and the pattern has been called but other modules that can be built in C or C + + the so early
in the coming 10 to 15 minutes so I'll be talking about why Python optimization is important so how do we find those bottlenecks and uh
a very short overview of the various providers available on the market and then on a very quick demo of how the degree looks like and what you see in the pool and if you about mix were profiling the so why do we need Biden optimization well it's no denial item
is everywhere Biden is being used in a lot of application that to date need a lot of time performance so if you look at the web frameworks and Django from TurboGears um flask so all these
require that stuff be done really really really fast and they're built systems like a storms build bought them then if you use it in your company but that we use did not for instance actually to build on the package for intelligent amplifier and other polls across Intel the scientific calculations and there are tools that freak and it's a free modeling software that that has a large sections built-in Python and so these require high-performance there also that pulls if if you if you know the and even Linux made out of Titan games there are games like civilization for other seems for uh these are Python-based games of this you want your game to to be efficient and run fast right so and how do we measure the performance the there are a couple of techniques there is good examination can opened
editor and check the good that this can be very tedious if you don't own the code you have included it always a good a super large how would you check everything on but that's 1 way there is another way logging you basically the and entered pieces of code in your in your Python script and say OK print this time step here and then let me know at the end of my function how much time the fraction of runs this also tedious manual work and then there is profiling providing is basically the cred howling television amplify works on in a sense what we're going to do is gather metrics from the
system has applications running and then at the end of the rainbow are going to analyze all those metrics and make sense out of all the data that we get and what we're going to focus on CPU hotspot profiling and find places in your code where are your code is spending a lot of time on the CPU or wasting a lot of time or the other frittered application whether 1 friend is waiting on a lot and the doing anything all essentially stalling in finding those issues and removing them is wrong the way to go now profiling there are a couple of types of
profiling there is even based profiling of which is essentially them collecting data are when certain events happened for instance of an during a functional existing affection for loading the class and loading the glass so things like that so the dough certain events we get performance data there's also instrumentation where the target applications modified and on this basically the application provides itself and then there is something the statistical of profiling now this is how the 2 works the tune is a statistical performance profile there are some
caveats it's so to bear in mind and obviously of as a statistical method the larger the data the larger the damage applications running the more curated is so this is why I have underlying approximated them but I've also put in bold much more and much less intrusive so with this uh statistical method that we employ in order to measure performance of biotin applications were able to come to get new overhead performance profiles and a the longer your application runs the better the reasons I this is a short overview of the various some providers uh you may have seen or not and there may be others but these are the most common ones in then
the trend company fire and what is great is it with it is that it comes to the reach Heidi advance highly customizable and we've you work of in order to see quickly and visually well the problems what something that's windows and what is also nice is this line level profiling that the function of a bit right and the source line where you problems on and overhead very important but an interpreted world of only 1 . 1 x on performances and that's a really low number compared to other line profile is like line provide itself which has a 10 x performances so there's going to use and provided unusable you can go the state the profile gets you did at the function level with a relatively low overhead but then again it's the granularity is very calls and and also part other bite pulls on that come bundled in ideas engages to you and again function level to its performance hit the our tool works with basically every and distribution you may be using but even the
the bite distribution supplied by wind or whatever system using all our all obviously on in the entire distribution 4 by 10 which is built with ICC support for 2 . 7 x bite and free on and remote collection arrest such so you can be using a Windows machine and then you can remote profile in Linux machine where you bite and good is running so that's really great only you can attach
writing process if you bite and good cannot be stopped you can just attached to the PID and get performance data and analyzing performance is actually really simple some 3 basic steps greater
project in our told configure the various settings run interpret essentially and I did it small test just to show you how it works so I ha of actually good
by 10 is doing something very very simple a trade this piece of code I hope
it's not too small and can use it is a
good enough yeah look get it if m subsets good also this code is very simple not that of lines of code that's a 21 script but it does some computation somehow the competition so imagine seeing this in some high-performance kernels so why does this there a small main script and there are 2 parts 1 is going to use multiprocessing and create 2 2 terms 2 processes and then called multiplying which is essentially going to multiply as it says and who mattresses 8 times b and store it in the so we're going to agree 2 processes and do this on highly was quite badly made free nested and then from petition here the so if you guys do this don't do it it's really bad implementation of OK and um and then there is another method which is out of the box using number so this is the best motorbike OK so basically new algebra and then having to run the code of order anything my Linux that's a machine collected the results in order to save time and opened it in the tune here on Windows so this is how it
looks like I have it in my summary page an overview of the an overview of my
of the time that the application has run there is also the CPU time which is basically the time per CPU core here I see 113 which is which looks good because of a dual core system and the elapsed time I will put them was 57 times to approximate 200 so my good was actually quite paralyzed on and you can also see in the CPU usage histogram my CPU concurrency was to and that's great and some some collection platform but although it I will I opened to multiprocessors because of it to call system that doesn't mean that it was great because you know we free nested on it's not so nice that I also have and in this script I'm the providing the performance of this blast them by and good if I go in the bottom up the but she 1
more thing In the tough hotspots it has ordered history
where you need to spend time to optimize you could so if I go into the bottom of this it
has sold at all the various methods called in your Python script the and the we can see that the the aggregation of those 2 multiplies contributed to most of the time and because of course also collected the call stack I can go and drill down to how my method was called in my Pitons crib by can double click on it and it will open the source file and died at the source line where most time was spent so this what I've been a double click on that line of that on the call stack line and it has the dramatically opened the source script the and can move that line the bit here so I we can see that most of the time was spent in doing this matrix multiplication 26 per cent of CPU usage the and going back to the
bottom up we can see the timeline how active was my CPU over the whole runtime of my application you can see that for the tumor to process and that the package metaprocesses created by CP was active both processes while easy computing the matrix multiplication and then at the end of my own stupid multiplication I had the best 1 and this can be seen at the very tiny and here I can do so mean and filtering by selection and the the
this is zoomed in timeline there's a very tiny little piece on the main Fred which is fried ID free free for 5 and that was the last version using them by we can't even zooming further filter in resuming entertain so what this does is um it will get that time and I'm zooming in and then it down the time many demi during that time line which methods were being called so even more control and more by 1 what
you see so I can see that uh and for this little part here for instance Harry metrics predict was called it is a shared object so often by the the which she said shared object and the end the call stuck phone them by so going back to my
slides you are able to also run
mixed-mode analyzes so basically get performance uh information about
a bite and good and also from inside an old ladies and good being called in your application made C C + + and you get all these for instance here of shared object so that's an library and the other 1 is by
so biden script so the summary training the obligation obviously and is a good thing to everybody has to do it there are ways to do it in the Chinese it call for it on i'm because of in Australia by
2 muscle who's sitting in front of maybe not so interesting for you in television amplify he's a commercial product but they're always also to get it for free it's for free in the beta program so if you sign up for the beta 2008 and 18 that comes up with more advanced deparaffining capabilities will be tuned for instance getting detailed information but frittered applications and also memory consumption of its
available in the 18 version beta it's a free for testing evaluation for a long period of time it's also for free for all people in academics students professors universities anybody from academia for free the but only for companies
that turns work on real projects and generate money your require license just a small what about it I'm an engineer I don't talk about business but it will I think that might be relevant for you are you may get more information into talk so conducted by my colleague David of the is infrastructure design patterns with biotin on wednesday but what is more relevant to this talk will be probably the workshop on Thursday which is all about the hands on on how to train your application with our tools on this thank you very much for your attention
it it that I and thanks for your
talk um if I understand well you can annotate the source of Python from and also see to see line by line the time of execution we would be possible to undertake directly sigh from source and not the C + + or C source that the reason generated uh what they mean by annotated who all speakers there's instrumentation but the me more about annotation in your case I mean just as we solving the diagram but you can see actually the source of lines and the time that they took to execute the cumulative time this kind of profiling part of the food instead of showing the sea source that was generated from the size and if we can see that directly the lines of sight yeah are actually you install directly from the line of sight and OK and the in about it and yeah how does it work OK because the question was had without microphone so the application is already running it has a process ID how do we actually attached to it by their mechanism so your D know the PID right if it's running but also if it has a PID you can also know the name of the application and then in degree and you can do that provide the name of the process ID and the general attached to it and 1 other question on a you have C extension modules and you also need that model to be comply with the the block block so that you can sample from each year and if you don't have access to that like it's just the binary that came in the distributions yeah that's a very good question on in this case you would basically see uh Frank had that these which is basically a hex code for functions that you don't know the name of our piton found binary provided by a the distribution is built ICC with the developed flag so essentially you can see D don't invited side of of the of method names being used in the top right for an exam library of this you would like to have minus G to get into debate information for you could the and your Python Distribution comes with an eye on the distribution of of all the time these this is just 1 of the ways you can actually just do some huge she just had the and repository and then you can also do them installed that Anakin as a preferred way the there is an icon . down and some of those the thank you the the kind of the thing mentioned region is a statistical tie provider and we've seen some results of some of the code that you're running so that meant that matches with the much cation yeah I was wondering if the results that we've seen are actually the result of running the code maybe like a number of times 10 thousand times and taking some fixing or or was that just 1 of and which is displayed the results of also that's an excellent question in this case it was run once the so this is what you get right away but in order for your size to confirm that the data that I got actually makes sense and is true here and it just have many times you can have a channel the pattern script that Francis Crick many times and and also how it will be tuned comes with the command line and interfaces was taken at this 1 line that uh does a providing for usage the reserves and everything so you cannot be a script and automate the running of your program many times and have the tune wrapping your application it's a command line interface and this is how you can have your own built system all regression testing system and get data and if that's the case is this hobby behind the time is 3rd quite slow to to run this kind of analysis like in multiple times or this finding it doesn't so I was just wondering what time of how much time you have to spend to I have to say you run your code and thousand times and just this is from it you have any type of metrics the OK out this depends on the resolution of your analysis so in my case I did they come an analysis with a resolution of 10 ms which is quite big actually so if you want more data more resolution you can lower this time and how many times of deceit to the lower the time duration to get this samples the larger the date of the larger potentially the overhead and less security could be your results so it's playing around on in general uh anything longer than me to free seconds is good enough so high yet have the questions that can you attach a profile to a running processes into subprocesses to be built in a special way for that yeah I can just profiling and production of things that I think the question was also already saw cancers answers yes because you can touch there presents the 2nd question was you had the library showed a presentation of some of the time taken in see a card that line occurred had 2 calls it was not looking to infer brackets temp like former so to function calls in that can it decomposes in the in the browser to those 2 function calls and processed on each 1 to because you just showing the sum of the for that 1 many of these multi process so your question these you have created to multiprocessors they're making 2 function calls in 1 line to method calls of 1 line something was looking to infer bracket template former so you calling info and former yeah i the decomposer in the browser while in this case it to aggregate the diamond show you on that 1 source line the whole time for that but I think it's a bad practice to do this for code readability opinions at all you do is but I wait for the whole than I would add 1 more thing by the way In this case it you will see the source line because what she asked associating time resource line your source good but in the bottom of you you will see different functions to functions when you but the thing is when it Click on both functions you we go to the same but you will know that time for each function but how text here they would like to ask what the interpreter at the you should know Europe distribution and if you have lot like the uh modifications to the interpreted make it fast while uh the Acustics sonnets agreed on what when I got what I with what I got is that how is alignment done memory alignment of alone uh what interpreted use and have you might been changed the interpreter to optimize it yeah Canada for is this for me this is what I and of OK thank you Adam and yeah well our interpreter has been made from scratch and combined with ICC there were some changes on I don't know in detail what has changed but there were minor changes in the interpreter however all the libraries making use of having mathematics and these have been redesigned completed making use of MDL so this is the benefit of bringing with our and the distribution of Titan so that you guys when you do X PC-based applications submitted by 10 old machine-learning deep learning all even using as the case or frameworks like stencil through a cafe all these and autonomous driving has to kill the computer vision is decay from Intel that leverages the Biden distribution you get the performance out of the box so it ought to be like a math genius to good property or each super and and and suffer engineer with great skills in good optimization to create high-performance it's done out of the box here welcome but it may be already lunchtime and just 1 thing if have really interesting questions that you really want to get answers I will workshop on it just on this topic could be very useful for you on its on Thursday student and request of world class their users because I see that on my machine we can connect live with prosody but as I say I have a class out and measure the performance so all the work machine or is it possible for a question yes it is possible so yeah protein using MPI right yeah yeah not not not not not using the i-vector and just OK let me take an idea as an example if you have a cluster several nodes you bite and good is being running on all all you have the tune amplifier the drivers something driver on all those guys and with MPI G tool for instance you just MPI ran ge told um amplifier XTC CIA which is the command line interface told and then you Python script and it will do the job harder for you and get you there is it's it's magic it's pronounced very interesting things here the other other thank you
Intel
Softwareentwickler
Division
Güte der Anpassung
EDV-Beratung
Kartesische Koordinaten
Softwareentwickler
Normalspannung
Analysis
Division
Aggregatzustand
Analysis
Programmiersprache
Web Site
Mathematik
Sondierung
Formale Sprache
Zahlenbereich
Applet
Mathematik
Optimierung
Computeranimation
Open Source
Datenfeld
Pufferüberlauf
Analysis
Streuungsdiagramm
Distributionstheorie
Tropfen
Distributionstheorie
Architektur <Informatik>
Mathematik
Quader
Formale Sprache
Globale Optimierung
Kartesische Koordinaten
Supercomputer
Sondierung
Computeranimation
Intel
Virtuelle Maschine
Datenfeld
Näherungsverfahren
Thread
Software
Supercomputer
Scheduling
Maschinencode
Ablöseblase
Notebook-Computer
Speicherabzug
Softwareentwickler
Distributionstheorie
Schnittstelle
Quader
Minimierung
Zahlenbereich
Kartesische Koordinaten
Supercomputer
Computeranimation
Intel
TUNIS <Programm>
Koroutine
Maschinencode
Vererbungshierarchie
Programmbibliothek
Einflussgröße
Autorisierung
Streuungsdiagramm
Distributionstheorie
Tropfen
Architektur <Informatik>
Mathematik
Globale Optimierung
Systemaufruf
Thread
Näherungsverfahren
Scheduling
Wort <Informatik>
Notebook-Computer
Hill-Differentialgleichung
Computerarchitektur
Instantiierung
Objekt <Kategorie>
Prozess <Informatik>
Stichprobennahme
Familie <Mathematik>
Profil <Aerodynamik>
Globale Optimierung
Mixed Reality
Kartesische Koordinaten
Übergang
Benutzeroberfläche
Gerade
Computeranimation
Benutzerprofil
Open Source
Adressraum
Maschinencode
Overhead <Kommunikationstechnik>
Hill-Differentialgleichung
Overhead <Kommunikationstechnik>
Softwareentwickler
Kartesische Koordinaten
Benutzeroberfläche
Service provider
Computeranimation
Übergang
Intel
Open Source
Turtle <Informatik>
Dedekind-Schnitt
Maschinencode
Gerade
Lineares Funktional
Prozess <Informatik>
Stichprobennahme
Mixed Reality
Globale Optimierung
Übergang
Gerade
Kugelkappe
Benutzerprofil
Ebene
Rechter Winkel
ATM
Overhead <Kommunikationstechnik>
Information
Fehlermeldung
Maschinencode
Web Site
Minimierung
Formale Sprache
Mixed Reality
Globale Optimierung
Kartesische Koordinaten
Physikalisches System
Modul
Binärcode
Computeranimation
Intel
Ebene
Gruppe <Mathematik>
ATM
Mustersprache
Ideal <Mathematik>
Demo <Programm>
Prozess <Informatik>
Minimierung
Skalierbarkeit
Globale Optimierung
Kartesische Koordinaten
Beanspruchung
Rechenbuch
Framework <Informatik>
Computeranimation
Intel
Systemprogrammierung
Software
Benutzerbeteiligung
Minimalgrad
Ebene
Maschinencode
ATM
Mixed Reality
Skript <Programm>
Maschinencode
Mereologie
Freeware
Texteditor
Hochdruck
Baumechanik
Information
ROM <Informatik>
Analysis
Computeranimation
Zeitstempel
Open Source
Quellcode
Systemprogrammierung
Informationsmodellierung
Software
Spieltheorie
Maschinencode
Skript <Programm>
Skript <Programm>
Bruchrechnung
Lineares Funktional
Linienelement
Prozess <Informatik>
Globale Optimierung
Physikalisches System
Beanspruchung
Rechnen
Benutzerprofil
Software
Ebene
Betafunktion
Garbentheorie
Versionsverwaltung
Zentraleinheit
Term
Instantiierung
Maschinencode
Klasse <Mathematik>
Regulärer Graph
Kartesische Koordinaten
Zentraleinheit
ROM <Informatik>
Computeranimation
Intel
TUNIS <Programm>
Typentheorie
Datentyp
Maschinencode
Statistische Analyse
Ereignishorizont
Distributionstheorie
Statistik
Linienelement
Stichprobe
Profil <Aerodynamik>
Globale Optimierung
Beanspruchung
Physikalisches System
Ereignishorizont
Benutzerprofil
Ebene
Funktion <Mathematik>
Betafunktion
Overhead <Kommunikationstechnik>
Compiler
Zentraleinheit
Instantiierung
Offene Menge
Distributionstheorie
Virtuelle Maschine
Regulärer Graph
Zahlenbereich
Kartesische Koordinaten
Service provider
Computeranimation
Eins
Übergang
Intel
Open Source
Interaktives Fernsehen
Maschinencode
Bildschirmfenster
Speicherabzug
Statistische Analyse
Gerade
Einflussgröße
Distributionstheorie
Lineares Funktional
Stichprobe
Mixed Reality
Profil <Aerodynamik>
Benchmark
Statistische Analyse
Quellcode
Beanspruchung
Visuelles System
Gerade
Funktion <Mathematik>
Twitter <Softwareplattform>
Rechter Winkel
ATM
Mereologie
Overhead <Kommunikationstechnik>
Ordnung <Mathematik>
Overhead <Kommunikationstechnik>
Zentraleinheit
Benutzerführung
Aggregatzustand
Distributionstheorie
Prozess <Physik>
Freeware
Profil <Aerodynamik>
Übergang
Digitalfilter
Physikalisches System
Benutzeroberfläche
Information
Analysis
Gerade
Computeranimation
Benutzerprofil
Intel
Open Source
Virtuelle Maschine
RPC
Hauptidealring
Kommandosprache
Maschinencode
Bildschirmfenster
Overhead <Kommunikationstechnik>
Versionsverwaltung
Intel
Softwaretest
Maschinencode
Menge
Projektive Ebene
Konfigurationsraum
Analysis
Computeranimation
Resultante
Algebraisches Modell
Maschinencode
Prozess <Physik>
Quader
Implementierung
Zahlenbereich
Computerunterstütztes Verfahren
Term
Systemaufruf
Analysis
Computeranimation
Kernel <Informatik>
Intel
Teilmenge
Virtuelle Maschine
TUNIS <Programm>
Bildschirmfenster
Mereologie
Skript <Programm>
Ordnung <Mathematik>
Speicher <Informatik>
Gerade
Datenparallelität
Systemaufruf
Kartesische Koordinaten
Physikalisches System
Benutzeroberfläche
Information
Zentraleinheit
Systemplattform
Mehrkernprozessor
Computeranimation
Homepage
Sinusfunktion
Intel
Arithmetisches Mittel
Histogramm
Histogramm
Minimum
Skript <Programm>
Speicherabzug
Zentraleinheit
Gammafunktion
Matrizenrechnung
Bit
Systemaufruf
Keller <Informatik>
Programmierumgebung
Quellcode
Benutzeroberfläche
Zentraleinheit
Computeranimation
Histogramm
Multiplikation
Minimum
Skript <Programm>
Gerade
Zentraleinheit
Matrizenrechnung
Freeware
Versionsverwaltung
Rechenzeit
Kartesische Koordinaten
Programmierumgebung
Benutzeroberfläche
Zentraleinheit
Computeranimation
W3C-Standard
Metropolitan area network
Multiplikation
Trennschärfe <Statistik>
Minimum
Gamecontroller
Gerade
Intel
Rechenschieber
Objekt <Kategorie>
Linienelement
Mereologie
Systemaufruf
Computermusik
Zentraleinheit
Computeranimation
Instantiierung
Binärcode
Güte der Anpassung
Mixed Reality
Kartesische Koordinaten
Interpretierer
Maßerweiterung
Analysis
Systemaufruf
Computeranimation
Objekt <Kategorie>
Intel
ATM
Maschinencode
Programmbibliothek
Vorlesung/Konferenz
Compiler
Zentraleinheit
Instantiierung
Wellenpaket
Freeware
Kartesische Koordinaten
Identitätsverwaltung
Maßerweiterung
Analysis
Computeranimation
Intel
Maschinencode
Produkt <Mathematik>
Skript <Programm>
Optimierung
Parallele Schnittstelle
Binärcode
Mixed Reality
Interpretierer
Keller <Informatik>
Gruppenoperation
Benutzerprofil
Festspeicher
Betafunktion
ATM
Ruhmasse
Information
Compiler
Zentraleinheit
Instantiierung
Softwaretest
Betafunktion
Versionsverwaltung
t-Test
Entwurfsmuster
Mixed Reality
Kartesische Koordinaten
Identitätsverwaltung
Information
Frequenz
Computeranimation
Gruppenoperation
Benutzerprofil
Intel
Generator <Informatik>
Reelle Zahl
Maschinencode
Projektive Ebene
Information
Grundraum
Parallele Schnittstelle
Leistungsbewertung
Resultante
Distributionstheorie
Prozess <Physik>
Gewichtete Summe
Minimierung
Browser
t-Test
Kartesische Koordinaten
Computer
Binärcode
Service provider
Dämpfung
TUNIS <Programm>
Prozess <Informatik>
Fahne <Mathematik>
Lineare Regression
Minimum
Mustersprache
Skript <Programm>
Maschinelles Sehen
Gerade
Schnittstelle
Bildauflösung
Softwaretest
Kraftfahrzeugmechatroniker
Interpretierer
Lineares Funktional
Statistik
Dokumentenserver
Kategorie <Mathematik>
Computersicherheit
Güte der Anpassung
Profil <Aerodynamik>
Systemaufruf
p-Block
Quellcode
Bildschirmsymbol
Biprodukt
Helmholtz-Zerlegung
Rhombus <Mathematik>
Rechter Winkel
Festspeicher
Ablöseblase
Information
Overhead <Kommunikationstechnik>
Ordnung <Mathematik>
Instantiierung
Maschinencode
Quader
Klasse <Mathematik>
Zahlenbereich
Kombinatorische Gruppentheorie
Framework <Informatik>
Virtuelle Maschine
Knotenmenge
Informationsmodellierung
Multiplikation
Hauptidealring
Stichprobenumfang
Datentyp
Programmbibliothek
Maßerweiterung
Optimierung
Analysis
Sechsecknetz
Mathematik
Linienelement
Anwendungsspezifischer Prozessor
Zwei
Rechenzeit
Physikalisches System
Modul
Chipkarte
Diagramm
Druckertreiber
Minimalgrad
Mereologie

Metadaten

Formale Metadaten

Titel Python Profiling with Intel® VTune™ Amplifier
Serientitel EuroPython 2017
Autor Sobhee, Shailen
Lizenz CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben
DOI 10.5446/33799
Herausgeber EuroPython
Erscheinungsjahr 2017
Sprache Englisch

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract Python Profiling with Intel® VTune™ Amplifier [EuroPython 2017 - Talk - 2017-07-10 - PythonAnywhere Room] [Rimini, Italy] Python has grown in both significance and popularity in the last years, especially in the field of high performance computing and machine learning. When it comes to performance, there are numerous ways of profiling and measuring code performance—with each analysis tool having its own strengths and weaknesses. In this talk, we will introduce a rich GUI application (Intel® VTune™ Amplifier) which can be used to analyze the runtime performance of one’s Python application, and fully understand where the performance bottlenecks are in one’s code. With this application, one may also analyze the call-stacks and get quick visual clues where one’s Python application is spending time or wasting CPU cycles

Ähnliche Filme

Loading...