Merken
NumPy: vectorize your brain
Automatisierte Medienanalyse
Diese automatischen Videoanalysen setzt das TIBAVPortal ein:
Szenenerkennung — Shot Boundary Detection segmentiert das Video anhand von Bildmerkmalen. Ein daraus erzeugtes visuelles Inhaltsverzeichnis gibt einen schnellen Überblick über den Inhalt des Videos und bietet einen zielgenauen Zugriff.
Texterkennung – Intelligent Character Recognition erfasst, indexiert und macht geschriebene Sprache (zum Beispiel Text auf Folien) durchsuchbar.
Spracherkennung – Speech to Text notiert die gesprochene Sprache im Video in Form eines Transkripts, das durchsuchbar ist.
Bilderkennung – Visual Concept Detection indexiert das Bewegtbild mit fachspezifischen und fächerübergreifenden visuellen Konzepten (zum Beispiel Landschaft, Fassadendetail, technische Zeichnung, Computeranimation oder Vorlesung).
Verschlagwortung – Named Entity Recognition beschreibt die einzelnen Videosegmente mit semantisch verknüpften Sachbegriffen. Synonyme oder Unterbegriffe von eingegebenen Suchbegriffen können dadurch automatisch mitgesucht werden, was die Treffermenge erweitert.
Erkannte Entitäten
Sprachtranskript
00:04
well let me get status and my name is Catherine and and by chance developers and how many of you know about but is out of this talk is not about such chemicals so if you are interested in large found please and this is our rules will be really happy to see you later and what I'm going to be talking about it is that vector is in your brain is number and this is actually the lecture taken from 108 machine learning unconstitutional in and have academic university in 7 is work and how many of you are using number I I mean how many of you are using them right in your area development well I'm saying I said need I'll now that I will not tell you anything new today and as I already mentioned in this talk is that from mining machine learning curves and you might wonder why this talk was included in such costs in the 1st place and this is the simplest answer in here I still have my cross the this simple algorithm 1 can you can imagine and k nearest neighbors and this argument through my you probably familiar with it and its use in classification tasks and the idea is to assign the label which is most frequent among her knearest neighbors to be object and assignment for this lecture
01:49
was to increase algorithms and apply it to once the data set and that was actually in it's called I got in applied to the user assignment and no 1 of my students actually used number and about it was and then these cold works to our I mean I can just made for so long time to I checked the assignment so might system and me and I decided to include introduction of the kind of number lecture in the course under that is my motivation to speak about nonPPI at these
02:38
knowledge and not the the main tool used in all of science and then what I wanna do today is talk about how to use them efficiently and how to use it for a data center and it's in the relatively easy but you have to think about in some different ways to about your quality rating num pi and to in order to use it efficiently so I'm going to go through some ideas that may be helpful well unfortunately when it was preparing this talk I found that I didn't have enough time to make a proper introduction to IPython's I still on assume for now that you all from really features and but I will explain some features to used in this during the talk let's get back to as a Python and let's talk about Python performance but the just thinking person learns about Python that Python is fast and it's fast for developing and time things out but
03:59
unfortunately the 2nd the you know learn about content that I is and everybody that by the the slope but do you know why you so let's write a single function to accomplish Euclidian distance and and this is actually also taken from 1st assignment we need these Euclidian distance to calculated to the nearest neighbors and on the transfer of God number of iterations needed and then we just accumulated the distance of the difference between 2 points and then predominantly is of accumulated nothing and I'm gonna use in time magic function included in my 5 thank you and I don't
04:56
non book and it allows you to measure your content and to quickly get benchmarks for the simple functions like this and then at that time it
05:10
functional and at the end you caught a couple of times to make sure it has the best result and we if we use
05:18
standard and we call our Euclidian distance function we find that it executes 2 . 67 and MS Pearl and you might wonder is it has already is it's slow now let's look on something in comparison and the best way to compare it this is to compare this to by language time sold if we instead implemented this exact function functions C
05:51
and I just here about semantic extension from my back to Lord sequel directly into Python solute we can use the same amount of time it functions but it's pretty all summer and if you haven't checked and I think we have to do is to do it and in the diamond these services function in the event that the government companies in 28 microseconds so we see that C quality is a hundred times faster than 5 so I'm sorry it's true pattern in small for this kind of task and what is the problem with the spike in cough nothing special nothing difficult to is done here we just a glance through the array and in some simple addition and multiplication and so let's do the next step and and we wanna find bottlenecks so that we want to learn what part of our quantity so slow and I'll use line profile installed on my computer and length has this nice API magic coming and at any given shows us how many times the time you spend on each line of called and that
07:22
this is anything strange here well it might
07:28
be kind of treaty if you haven't seen in the urine output before about this tension think here is that spent 38 per cent over all the time on the line on will be so the question is why and to also this
07:52
question we have to go back and see differences between languages and procedural and other languages are compiled and statically typed languages so you right the quality you had a compiler and that times through quality and and decides how it's going to be executed and the the downside of it is that he the compiler needs to know and variable types and the compiler time that means you have to specify types yourself but actually I really love C and it was my 1st language but
08:35
it's far more cumbersome you have to add all of these extra stuff and mean and you have to remember to declare relevant variables and in sentence and but I think or act out on the other hand
08:54
are interpreted languages so they don't compare them to the effect machine quote which means it
09:02
executes a little bit slower by their and bandages as well and we all know that by has use a dynamic type system i which makes program and so and you don't have to specify the types yourself you don't have to write type annotations and my colleague Andreas is going be talking about and the
09:26
annotations and when it becomes useful that is so please visit his talk tomorrow and it's gonna be interesting and the cell vector the dynamic nature of Python it's them into Python duration and there is there is a little bit of overhead for thinks like a type checking and went into a lesbian factor and the interpreter has to check type of AD and then checked type of B and then find the proper court acute and then returns the and there is also reference counting inductor has to mental reference contour and then decrease seconds counter as a change of random cellular and not only like pattern because he sees unified said somewhat slower but very quick to do well to provide the quality and well that's why I used by so the
10:30
question is what we what do we do in this slide and
10:37
that's where number comes in not that is basically designed to help us get the best from both worlds and I want to have faster execution time from languages like C and we want to test development director from time so I'm going to talk to the and
11:00
here is some ideas through make Python faster when you're working the numerical data and the 1st thing I'm going to talk about these you differences and and it's the simplest opportunity
11:19
I just want the name for a universal function
11:23
and this is basically a special type of functions and defined it in and number the library and it generates heat elementwise and the Andean behind you can't see it is to combine the functionality and they will come together well in 1 so let me show you an example of this he shared Python programmers who doesn't use num pi and the you wanna do elementwise operations when you're on this is probably the best thing to do it and so we have a a over into uh when you send you want to add 1 to each of this variance and as it would by hand program and you probably miscomprehension so you do and natural way plus 1 for already in any and print out of the result so this is Python agreed to do it not only to to do this is to which is in the same way it is a great number during the there special attention functions and then here we adopt and that's what we want to the end of the year all of something like this let's say you do hear it is the you trace your areas it is just a number and not higher lots of plus operator and actually produces the result in so that was authorities in here we knew so this is a binary you function and uracil function commands slope and functionality so what it set here is that it really do a plus 1 is without them I wanna look through all of the elements of the theory and I want to add 1 to each of those and they have this sense thinking for multiplication and for them coverages and not the city is elementwise multiplication not just semantics product and we'll have a nice and explore Americans politics in patterns the don't find no no and then I will not is the difference here we don't we don't have any over here so and
13:50
then we have modified the school actually taken place in the internals the number and then and the question is why do we care about the so
14:03
let's take a look at and the speed of the plants 1st of all we trade large with a lot of radius and 2 % in time function means to time everything in the cell and he when the time creating innovative and aid 1 to each element of the area and from now wouldn't get the is 110 microseconds and if dual this same in pure Python we do this by hand and type in the correct and and then we'll look school the lengths of their way and then the add I want for each element of the array and again the got 100 hectares speed up and also I should point out that it's much more easier to type and understand this quality it's hard to get it wrong and then list comprehension and you might ask why Python when I'm so much faster so what is the magic that have what happens under the hood the unit of work here and what is uncertainty in the fact that many years Mumbai functions the loops are happening in compiled code so long time is it could be that it should be in the region redundancy in and you
15:37
have compiled functions for common durations of these common variations on so it you be actually access that this common durations in Python using the highlevel expression and that's why it is so much and but doesn't to make sense the OK well it's it's the it's really
16:04
nice of these functions and their many functions
16:08
and it'd be looking into a number i and basically all arithmetical divisions the comparison of the 2 separations loaded from nonprimary said to do this sort of Europe functions and and is there a bunch of that means a of seeing them and and number well and and the next thing we don't talk about it is the slicing and indexing and if you use to the least said impact on and you not that you can index in this you then an integer are to find a single value and you can also in needs to be in the slides to get multiple images and you can actually do absolutely the same man is the number of bits welcome the 1 interest and think about number slicing is that there is no memory overhead like unlikely in plain Python lists and I'm entirely redone suggests that you over there so sentence lessons to the new variable and you change only 1 value in that military and then this vector is changed in the initial areas so please be aware of in multidimensional arrays you can access and enhanced by all columns and all common cold so indexes and so if you pass it 0 come on line and we are asking that for 0 0 and column 1 and the very easy 1 and we can also use slicing ornament multidimensional arrays in the last example here we got the semantics and we can go further and combine slices and indexes together and here we are asking for a whole number 0 and for all columns and she is in exactly the same to them and 0 over x of 0 so and in a online actually offers them a lot of other fast and convenient place through do all sorts of indexing people to go in it's more complicated to areas that index more complicated chunks of data and 1 of those inexperienced and on the next this is just basically passing the list of things exist the area so if you want to as sentences zeroth and 1st sentiment over and we just put those in existing in this and the bias that these through the area index and I came up with their relatives and again we don't have to write here and over these indexes you just them all together at once and it's much weaker than to love them right in the and by the way we think about the thesis is that it doesn't return the view of very essence in the before being most cases is determined a copy of it right so and you have to be aware of this and you can see here that in this assignment didn't change the value of the initial area acts on like a solid B and that allows you to use boolean masks uh as an indexing so instead of passing integer to choose values from access you can pass these maps and it it'll construct the area you are interested in and so this might seem like well why would I need to think like the and that in minute becomes handing out when you combine this with a simple you functions you saw earlier and lose if you look at the last example on the side here we used to x is greater than 2 and how to construct wouldn't and then it just passes in this area to the area index and then of text lemonade from myself and using this technique mostly on data preparation steps
21:00
and for instance when we are looking at on the
21:04
area and then we want to speed dating to test and train sorts of the United States to do this decides in the European side by and just 10 speed is to create an and masculinity clearly the lens of the eye and applying this mask led to the area and apply these negative version of it to the and that's how these things being can achieved by my students so instead of writing that this will over the least and uh you know from the flow of freedom in the least if some condition is abandoned it to the result of it happens have automatically and it happens in 1 line of code and it is much much quicker than uh these Python by hand pressure and and next they didn't wanna talk about it is using the number time broadcasting so this is something very cool
22:13
about 1 and broadcasting in 1 of reasons thing that really makes a number of powerful and policy express very complicated to operations with reasoning and what broadcasting down speeding gives you a set of rules that are very she you find operates on the and areas of different sizes and dimensions so what this set of rules so almost you to do is to do things like for example and then introduce to you and Mary and well you can add role to their metrics so you can do even crazier things so you can add the to the column and it'll expands to the 2 dimensional matrix so the role of broadcasting is pretty simple but in some that's a little bit confusing and it takes a while to wrap your mind around to what's going on and but once you get this and you can do a huge amount of evidence and said that it was really
23:23
efficiently using these broadcasts so the 1st rule is that the variations shades differ left and the smaller scale the the once and then you compare the 2 dimensions and if any dimension doesn't match they do broadcast all kind of expanded and the dimensions in the size of equals to 1 and that that the dimensions and nonfinancial but neither of them is equal to 1 there is no way to together and you an error so this is a quick example of how it was we only saw adding a skeleton vector example we spoke about the you functions we did not bigger than that it was broadcasting and look this example that we have to make any metrics and we are adding the length of the vector so the 1st thing we do here and we do we have left that they had to be the ones to make the number of the dimensions much and then you brought that up and use trade show that picture of the whole metrics so then we have to 2 metal systematic and then they just add them together and we got the the results to buy the and we can think about it like an accordion memory at a constant rate to much dimensions but there is not should actually there no copying memory and this is just an abstraction to think about it so there is no memory of and then number 1 just x a this happening under the hood of so what this is in this I want you to do is and to do things like this and then writing the article said that wanted to erase and invite you can express this we use it as a broadcast in the text and then you get much faster version and much faster computations and also much cleaner so you don't have to worry about groups and that I should be eating here for the annotation but it works for any binary functions and more nice feature about nonpayment not
26:08
you might have seen before and the have the by to Maddox here and what will happen if we add these 2 together according to broadcast and well
26:29
we got him when you're not because the our shapes
26:33
and the way to little and have the length area so there is no money and there's no way to my should those together and we can lift it and Arabia once by then we just can't expand this too much the metrics change and so
26:55
and here comes the and unplanned and what is best and ask you to and the there and that that's and new axis here and you can and cannot exercise where we want and it's a very useful anyone everywhere and until you raise some how to broadcast it in a way you wanted so what does it still make sense about because and my lectures in university most of my students were lost at this point but once again broadcast in in the doesn't and additional memory it doesn't actually allocates so the elastic the element of today is number aggregations and number
27:48
interagency that functions which summarizes the letters so there is some and that as an
27:57
example I have a new functions and none has and much of the and relations of the things like minimum maximum some so and again it's something that is if you're writing it out of all you have to write a little Python open and so that you will loop over the city and do it yourself but it's much faster to do this using you in countries and 1 moment think about time and and conditions so that I conditions scandal in it is to work on multidimensional so if you want to get the mean value of the entire area and you do X . mean and you want the mean value of the and columns so over you pass the exercise argument there so you got to the end of your call and so on on so there is a lot of regulations available in number and then you should get from malaria and read them if you are going to do some large scale data
29:12
analysis and the whole thing about them is that all of them have the same call signatures so you can pass X is prompted to all of them also in in
29:27
quick summary right in Python is fast invited loops in particular slow and if you're looking over there are a large dataset and then the best that the best way to do this is to use a number of such and to try it some of these techniques and the very last little thing that I want show you is the the example of how it so it can be used to implement in a meaningful and the algorithm so we will be using kmeans here and I believe all of you know this 100 and this is question so it's just a quick reminder
30:14
of how audiences on this boat you select key points and random and cluster centers and assigns objects to their closest cluster centers according to euclidean distance and then calculated as this centroids what the mean of all of the objects in each cluster and then you repeat steps 2 2 3 and 4 under here we just generate some things and synthetic data to work with the and here it
30:50
is and so the visualization these data and we have a bunch of bonds floating in
30:56
the space and we want a computer classes for each point here and basically what we're gonna do we're going to compute Euclidian distance and here we've got mechanized version of it the so here and just 5 lines of course and then carry this is a on giving them implemented aligned aligned like it was written before and so I had to look at this set of words that some definition and I just managed to translated to linebyline so it can be achieved by by pure Python about a month might and this makes me really excited and here think so just out of
31:55
time and I'm going to leave you with this and if you are interested in this let's said you can go too much into account and I'll post link to slide and and they want to thank you for listening and I hope this was
32:10
helpful and this is judge the lines and the rest of conference well thank you
32:26
get number of no it's not focuses on how you have some no questions really this can of
32:48
the could have you ever comparative by performance despite by for example if any of you students refuses to use number by but you still need to check the assignment you can just run on by by the cellular as
33:06
well as on pipeline and the number that and so this is the friends and just in time comparison radius it doesn't of these and talk and sometimes it's phosphorus sometimes you know I pipeline so the idea is that there a lot of work to be done get we
33:42
should good the model is easy to relate to testimony the only universal function is the
33:57
perfectly easy and will highlight can on having set a human rights and also functional yourself and then to and is a worker like built 1 OK
34:16
thank intensity coming
00:00
Objekt <Kategorie>
Task
Abstand
Virtuelle Maschine
Parametersystem
Algorithmus
Flächeninhalt
Zahlenbereich
Schlussregel
Kurvenanpassung
Softwareentwickler
Grundraum
Computeranimation
01:48
Rechenzentrum
Abstand
Menge
tTest
Zahlenbereich
Übergang
Physikalisches System
MiniDisc
Objektklasse
Bitrate
Ordnung <Mathematik>
Große Vereinheitlichung
Computeranimation
03:58
Lineares Funktional
Subtraktion
Punkt
Zahlenbereich
Iteration
Wärmeübergang
Computeranimation
Loop
Verschlingung
Grundsätze ordnungsmäßiger Datenverarbeitung
Inhalt <Mathematik>
Abstand
Benchmark
Euklidische Ebene
05:09
Resultante
Lineares Funktional
Loop
Formale Sprache
Paarvergleich
Abstand
Computeranimation
05:49
Lineares Funktional
Addition
Dicke
Profil <Aerodynamik>
Fortsetzung <Mathematik>
Computer
Ereignishorizont
Computeranimation
Task
Rhombus <Mathematik>
Dienst <Informatik>
Multiplikation
Mereologie
Mustersprache
Maßerweiterung
Gerade
Euklidische Ebene
07:24
Inklusion <Mathematik>
Subtraktion
Zellularer Automat
Compiler
Datentyp
Formale Sprache
Trigonometrie
Algorithmische Programmiersprache
Gerade
Computeranimation
Euklidische Ebene
Funktion <Mathematik>
08:34
Soundverarbeitung
Arithmetisches Mittel
Virtuelle Maschine
Variable
Formale Sprache
Deklarative Programmiersprache
Computeranimation
09:00
Interpretierer
Bit
Diskretes System
Natürliche Zahl
Mustersprache
Zwei
Datentyp
Snake <Bildverarbeitung>
Zellularer Automat
Schreiben <Datenverarbeitung>
Physikalisches System
Overhead <Kommunikationstechnik>
Optimierung
Teilbarkeit
Computeranimation
10:29
Softwaretest
Formale Sprache
Rechenzeit
Zahlenbereich
Softwareentwickler
10:59
Autorisierung
Resultante
Lineares Funktional
Nichtlinearer Operator
Programmiergerät
Subtraktion
Zahlenbereich
Element <Mathematik>
Biprodukt
Physikalische Theorie
Computeranimation
Formale Semantik
Spannweite <Stochastik>
Multiplikation
Funktion <Mathematik>
Flächeninhalt
Mustersprache
Datentyp
Programmbibliothek
Programmbibliothek
Optimierung
13:48
Radius
Lineares Funktional
Loop
Dicke
Einheit <Mathematik>
Loop
Relationentheorie
Datentyp
Zellularer Automat
Zahlenbereich
MailingListe
Element <Mathematik>
Code
Computeranimation
15:35
Gleitkommarechnung
Lineares Funktional
Arithmetischer Ausdruck
Trigonometrie
Paarvergleich
Computeranimation
16:06
Bit
Mathematisierung
Program Slicing
Zahlenbereich
Division
Statistische Hypothese
Computeranimation
Formale Semantik
Gerade
Metropolitan area network
Gleitkommarechnung
Trennungsaxiom
Automatische Indexierung
Lineares Funktional
Sichtenkonzept
Relativitätstheorie
Indexberechnung
MailingListe
Paarvergleich
QuickSort
Vorhersagbarkeit
Verdeckungsrechnung
Mapping <Computergraphik>
Rechenschieber
Codec
Flächeninhalt
Ganze Zahl
Automatische Indexierung
Festspeicher
Trigonometrie
Overhead <Kommunikationstechnik>
Instantiierung
21:03
Resultante
Matrizenrechnung
Subtraktion
Wellenpaket
HausdorffDimension
tTest
Versionsverwaltung
Zahlenbereich
Broadcastingverfahren
Code
Computeranimation
Broadcastingverfahren
Gerade
Leistung <Physik>
Softwaretest
Nichtlinearer Operator
Linienelement
Schlussregel
Datenfluss
QuickSort
Verdeckungsrechnung
Druckverlauf
Flächeninhalt
Menge
Konditionszahl
23:21
Resultante
TVDVerfahren
HausdorffDimension
Zahlenbereich
Computerunterstütztes Verfahren
Broadcastingverfahren
Computeranimation
Eins
Skeleton <Programmierung>
HausdorffDimension
Konditionszahl
Broadcastingverfahren
Kartesische Koordinaten
Lineares Funktional
Zentrische Streckung
Dicke
Linienelement
Matching <Graphentheorie>
Abstraktionsebene
Schlussregel
Bitrate
Gleichheitszeichen
Schlussregel
Festspeicher
Dimensionsanalyse
Fehlermeldung
Shape <Informatik>
26:26
Dicke
Shape <Informatik>
Flächeninhalt
Linienelement
Mathematisierung
Computeranimation
26:55
Lineares Funktional
Addition
Punkt
Festspeicher
Primzahlzwillinge
tTest
Zahlenbereich
Kartesische Koordinaten
Element <Mathematik>
Grundraum
Computeranimation
27:56
Lineares Funktional
Parametersystem
Zentrische Streckung
Momentenproblem
Extrempunkt
Relativitätstheorie
Systemaufruf
Zahlenbereich
Elektronische Unterschrift
Computeranimation
Loop
Flächeninhalt
Offene Menge
Konditionszahl
Regulator <Mathematik>
Analysis
29:24
Objekt <Kategorie>
Algorithmus
Automatische Indexierung
Rundung
Punkt
Snake <Bildverarbeitung>
Computeranimation
Objekt <Kategorie>
Arithmetisches Mittel
Zufallszahlen
Algorithmus
Funktion <Mathematik>
Gruppentheorie
Rechter Winkel
Code
Randomisierung
Punkt
Abstand
Broadcastingverfahren
Klumpenstichprobe
Schlüsselverwaltung
Euklidische Ebene
30:47
Punkt
Menge
Klasse <Mathematik>
Versionsverwaltung
Visualisierung
Wort <Informatik>
Abstand
Computer
RaumZeit
Gerade
Computeranimation
31:50
Binder <Informatik>
Gerade
Computeranimation
32:25
Roboter
Font
tTest
Zahlenbereich
Computeranimation
33:05
Lineares Funktional
Informationsmodellierung
Zahlenbereich
Paarvergleich
Computeranimation
33:56
Roboter
Einheit <Mathematik>
Rechter Winkel
Computeranimation
Metadaten
Formale Metadaten
Titel  NumPy: vectorize your brain 
Serientitel  EuroPython 2015 
Teil  119 
Anzahl der Teile  173 
Autor 
Tuzova, Ekaterina

Lizenz 
CCNamensnennung  keine kommerzielle Nutzung  Weitergabe unter gleichen Bedingungen 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nichtkommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben 
DOI  10.5446/20104 
Herausgeber  EuroPython 
Erscheinungsjahr  2015 
Sprache  Englisch 
Produktionsort  Bilbao, Euskadi, Spain 
Inhaltliche Metadaten
Fachgebiet  Informatik 
Abstract  Ekaterina Tuzova  NumPy: vectorize your brain NumPy is the fundamental Python package for scientific computing. However, being efficient with NumPy might require slightly changing how you write Python code. I’m going to show you the basic idioms essential for fast numerical computations in Python with NumPy. We'll see why Python loops are slow and why vectorizing these operations with NumPy can often be good. Topics covered in this talk will be array creation, broadcasting, universal functions, aggregations, slicing and indexing. Even if you're not using NumPy you'll benefit from this talk. 
Schlagwörter 
EuroPython Conference EP 2015 EuroPython 2015 