Merken

# NumPy: vectorize your brain

#### Automatisierte Medienanalyse

## Diese automatischen Videoanalysen setzt das TIB|AV-Portal ein:

**Szenenerkennung**—

**Shot Boundary Detection**segmentiert das Video anhand von Bildmerkmalen. Ein daraus erzeugtes visuelles Inhaltsverzeichnis gibt einen schnellen Überblick über den Inhalt des Videos und bietet einen zielgenauen Zugriff.

**Texterkennung**–

**Intelligent Character Recognition**erfasst, indexiert und macht geschriebene Sprache (zum Beispiel Text auf Folien) durchsuchbar.

**Spracherkennung**–

**Speech to Text**notiert die gesprochene Sprache im Video in Form eines Transkripts, das durchsuchbar ist.

**Bilderkennung**–

**Visual Concept Detection**indexiert das Bewegtbild mit fachspezifischen und fächerübergreifenden visuellen Konzepten (zum Beispiel Landschaft, Fassadendetail, technische Zeichnung, Computeranimation oder Vorlesung).

**Verschlagwortung**–

**Named Entity Recognition**beschreibt die einzelnen Videosegmente mit semantisch verknüpften Sachbegriffen. Synonyme oder Unterbegriffe von eingegebenen Suchbegriffen können dadurch automatisch mitgesucht werden, was die Treffermenge erweitert.

Erkannte Entitäten

Sprachtranskript

00:04

well let me get status and my name is Catherine and and by chance developers and how many of you know about but is out of this talk is not about such chemicals so if you are interested in large found please and this is our rules will be really happy to see you later and what I'm going to be talking about it is that vector is in your brain is number and this is actually the lecture taken from 108 machine learning unconstitutional in and have academic university in 7 is work and how many of you are using number I I mean how many of you are using them right in your area development well I'm saying I said need I'll now that I will not tell you anything new today and as I already mentioned in this talk is that from mining machine learning curves and you might wonder why this talk was included in such costs in the 1st place and this is the simplest answer in here I still have my cross the this simple algorithm 1 can you can imagine and k nearest neighbors and this argument through my you probably familiar with it and its use in classification tasks and the idea is to assign the label which is most frequent among her k-nearest neighbors to be object and assignment for this lecture

01:49

was to increase algorithms and apply it to once the data set and that was actually in it's called I got in applied to the user assignment and no 1 of my students actually used number and about it was and then these cold works to our I mean I can just made for so long time to I checked the assignment so might system and me and I decided to include introduction of the kind of number lecture in the course under that is my motivation to speak about non-PPI at these

02:38

knowledge and not the the main tool used in all of science and then what I wanna do today is talk about how to use them efficiently and how to use it for a data center and it's in the relatively easy but you have to think about in some different ways to about your quality rating num pi and to in order to use it efficiently so I'm going to go through some ideas that may be helpful well unfortunately when it was preparing this talk I found that I didn't have enough time to make a proper introduction to IPython's I still on assume for now that you all from really features and but I will explain some features to used in this during the talk let's get back to as a Python and let's talk about Python performance but the just thinking person learns about Python that Python is fast and it's fast for developing and time things out but

03:59

unfortunately the 2nd the you know learn about content that I is and everybody that by the the slope but do you know why you so let's write a single function to accomplish Euclidian distance and and this is actually also taken from 1st assignment we need these Euclidian distance to calculated to the nearest neighbors and on the transfer of God number of iterations needed and then we just accumulated the distance of the difference between 2 points and then predominantly is of accumulated nothing and I'm gonna use in time magic function included in my 5 thank you and I don't

04:56

non book and it allows you to measure your content and to quickly get benchmarks for the simple functions like this and then at that time it

05:10

functional and at the end you caught a couple of times to make sure it has the best result and we if we use

05:18

standard and we call our Euclidian distance function we find that it executes 2 . 67 and MS Pearl and you might wonder is it has already is it's slow now let's look on something in comparison and the best way to compare it this is to compare this to by language time sold if we instead implemented this exact function functions C

05:51

and I just here about semantic extension from my back to Lord sequel directly into Python solute we can use the same amount of time it functions but it's pretty all summer and if you haven't checked and I think we have to do is to do it and in the diamond these services function in the event that the government companies in 28 microseconds so we see that C quality is a hundred times faster than 5 so I'm sorry it's true pattern in small for this kind of task and what is the problem with the spike in cough nothing special nothing difficult to is done here we just a glance through the array and in some simple addition and multiplication and so let's do the next step and and we wanna find bottlenecks so that we want to learn what part of our quantity so slow and I'll use line profile installed on my computer and length has this nice API magic coming and at any given shows us how many times the time you spend on each line of called and that

07:22

this is anything strange here well it might

07:28

be kind of treaty if you haven't seen in the urine output before about this tension think here is that spent 38 per cent over all the time on the line on will be so the question is why and to also this

07:52

question we have to go back and see differences between languages and procedural and other languages are compiled and statically typed languages so you right the quality you had a compiler and that times through quality and and decides how it's going to be executed and the the downside of it is that he the compiler needs to know and variable types and the compiler time that means you have to specify types yourself but actually I really love C and it was my 1st language but

08:35

it's far more cumbersome you have to add all of these extra stuff and mean and you have to remember to declare relevant variables and in sentence and but I think or act out on the other hand

08:54

are interpreted languages so they don't compare them to the effect machine quote which means it

09:02

executes a little bit slower by their and bandages as well and we all know that by has use a dynamic type system i which makes program and so and you don't have to specify the types yourself you don't have to write type annotations and my colleague Andreas is going be talking about and the

09:26

annotations and when it becomes useful that is so please visit his talk tomorrow and it's gonna be interesting and the cell vector the dynamic nature of Python it's them into Python duration and there is there is a little bit of overhead for thinks like a type checking and went into a lesbian factor and the interpreter has to check type of AD and then checked type of B and then find the proper court acute and then returns the and there is also reference counting inductor has to mental reference contour and then decrease seconds counter as a change of random cellular and not only like pattern because he sees unified said somewhat slower but very quick to do well to provide the quality and well that's why I used by so the

10:30

question is what we what do we do in this slide and

10:37

that's where number comes in not that is basically designed to help us get the best from both worlds and I want to have faster execution time from languages like C and we want to test development director from time so I'm going to talk to the and

11:00

here is some ideas through make Python faster when you're working the numerical data and the 1st thing I'm going to talk about these you differences and and it's the simplest opportunity

11:19

I just want the name for a universal function

11:23

and this is basically a special type of functions and defined it in and number the library and it generates heat element-wise and the Andean behind you can't see it is to combine the functionality and they will come together well in 1 so let me show you an example of this he shared Python programmers who doesn't use num pi and the you wanna do element-wise operations when you're on this is probably the best thing to do it and so we have a a over into uh when you send you want to add 1 to each of this variance and as it would by hand program and you probably miscomprehension so you do and natural way plus 1 for already in any and print out of the result so this is Python agreed to do it not only to to do this is to which is in the same way it is a great number during the there special attention functions and then here we adopt and that's what we want to the end of the year all of something like this let's say you do hear it is the you trace your areas it is just a number and not higher lots of plus operator and actually produces the result in so that was authorities in here we knew so this is a binary you function and uracil function commands slope and functionality so what it set here is that it really do a plus 1 is without them I wanna look through all of the elements of the theory and I want to add 1 to each of those and they have this sense thinking for multiplication and for them coverages and not the city is element-wise multiplication not just semantics product and we'll have a nice and explore Americans politics in patterns the don't find no no and then I will not is the difference here we don't we don't have any over here so and

13:50

then we have modified the school actually taken place in the internals the number and then and the question is why do we care about the so

14:03

let's take a look at and the speed of the plants 1st of all we trade large with a lot of radius and 2 % in time function means to time everything in the cell and he when the time creating innovative and aid 1 to each element of the area and from now wouldn't get the is 110 microseconds and if dual this same in pure Python we do this by hand and type in the correct and and then we'll look school the lengths of their way and then the add I want for each element of the array and again the got 100 hectares speed up and also I should point out that it's much more easier to type and understand this quality it's hard to get it wrong and then list comprehension and you might ask why Python when I'm so much faster so what is the magic that have what happens under the hood the unit of work here and what is uncertainty in the fact that many years Mumbai functions the loops are happening in compiled code so long time is it could be that it should be in the region redundancy in and you

15:37

have compiled functions for common durations of these common variations on so it you be actually access that this common durations in Python using the high-level expression and that's why it is so much and but doesn't to make sense the OK well it's it's the it's really

16:04

nice of these functions and their many functions

16:08

and it'd be looking into a number i and basically all arithmetical divisions the comparison of the 2 separations loaded from non-primary said to do this sort of Europe functions and and is there a bunch of that means a of seeing them and and number well and and the next thing we don't talk about it is the slicing and indexing and if you use to the least said impact on and you not that you can index in this you then an integer are to find a single value and you can also in needs to be in the slides to get multiple images and you can actually do absolutely the same man is the number of bits welcome the 1 interest and think about number slicing is that there is no memory overhead like unlikely in plain Python lists and I'm entirely redone suggests that you over there so sentence lessons to the new variable and you change only 1 value in that military and then this vector is changed in the initial areas so please be aware of in multidimensional arrays you can access and enhanced by all columns and all common cold so indexes and so if you pass it 0 come on line and we are asking that for 0 0 and column 1 and the very easy 1 and we can also use slicing ornament multidimensional arrays in the last example here we got the semantics and we can go further and combine slices and indexes together and here we are asking for a whole number 0 and for all columns and she is in exactly the same to them and 0 over x of 0 so and in a online actually offers them a lot of other fast and convenient place through do all sorts of indexing people to go in it's more complicated to areas that index more complicated chunks of data and 1 of those inexperienced and on the next this is just basically passing the list of things exist the area so if you want to as sentences zeroth and 1st sentiment over and we just put those in existing in this and the bias that these through the area index and I came up with their relatives and again we don't have to write here and over these indexes you just them all together at once and it's much weaker than to love them right in the and by the way we think about the thesis is that it doesn't return the view of very essence in the before being most cases is determined a copy of it right so and you have to be aware of this and you can see here that in this assignment didn't change the value of the initial area acts on like a solid B and that allows you to use boolean masks uh as an indexing so instead of passing integer to choose values from access you can pass these maps and it it'll construct the area you are interested in and so this might seem like well why would I need to think like the and that in minute becomes handing out when you combine this with a simple you functions you saw earlier and lose if you look at the last example on the side here we used to x is greater than 2 and how to construct wouldn't and then it just passes in this area to the area index and then of text lemonade from myself and using this technique mostly on data preparation steps

21:00

and for instance when we are looking at on the

21:04

area and then we want to speed dating to test and train sorts of the United States to do this decides in the European side by and just 10 speed is to create an and masculinity clearly the lens of the eye and applying this mask led to the area and apply these negative version of it to the and that's how these things being can achieved by my students so instead of writing that this will over the least and uh you know from the flow of freedom in the least if some condition is abandoned it to the result of it happens have automatically and it happens in 1 line of code and it is much much quicker than uh these Python by hand pressure and and next they didn't wanna talk about it is using the number time broadcasting so this is something very cool

22:13

about 1 and broadcasting in 1 of reasons thing that really makes a number of powerful and policy express very complicated to operations with reasoning and what broadcasting down speeding gives you a set of rules that are very she you find operates on the and areas of different sizes and dimensions so what this set of rules so almost you to do is to do things like for example and then introduce to you and Mary and well you can add role to their metrics so you can do even crazier things so you can add the to the column and it'll expands to the 2 dimensional matrix so the role of broadcasting is pretty simple but in some that's a little bit confusing and it takes a while to wrap your mind around to what's going on and but once you get this and you can do a huge amount of evidence and said that it was really

23:23

efficiently using these broadcasts so the 1st rule is that the variations shades differ left and the smaller scale the the once and then you compare the 2 dimensions and if any dimension doesn't match they do broadcast all kind of expanded and the dimensions in the size of equals to 1 and that that the dimensions and non-financial but neither of them is equal to 1 there is no way to together and you an error so this is a quick example of how it was we only saw adding a skeleton vector example we spoke about the you functions we did not bigger than that it was broadcasting and look this example that we have to make any metrics and we are adding the length of the vector so the 1st thing we do here and we do we have left that they had to be the ones to make the number of the dimensions much and then you brought that up and use trade show that picture of the whole metrics so then we have to 2 metal systematic and then they just add them together and we got the the results to buy the and we can think about it like an accordion memory at a constant rate to much dimensions but there is not should actually there no copying memory and this is just an abstraction to think about it so there is no memory of and then number 1 just x a this happening under the hood of so what this is in this I want you to do is and to do things like this and then writing the article said that wanted to erase and invite you can express this we use it as a broadcast in the text and then you get much faster version and much faster computations and also much cleaner so you don't have to worry about groups and that I should be eating here for the annotation but it works for any binary functions and more nice feature about non-payment not

26:08

you might have seen before and the have the by to Maddox here and what will happen if we add these 2 together according to broadcast and well

26:29

we got him when you're not because the our shapes

26:33

and the way to little and have the length area so there is no money and there's no way to my should those together and we can lift it and Arabia once by then we just can't expand this too much the metrics change and so

26:55

and here comes the and unplanned and what is best and ask you to and the there and that that's and new axis here and you can and cannot exercise where we want and it's a very useful anyone everywhere and until you raise some how to broadcast it in a way you wanted so what does it still make sense about because and my lectures in university most of my students were lost at this point but once again broadcast in in the doesn't and additional memory it doesn't actually allocates so the elastic the element of today is number aggregations and number

27:48

inter-agency that functions which summarizes the letters so there is some and that as an

27:57

example I have a new functions and none has and much of the and relations of the things like minimum maximum some so and again it's something that is if you're writing it out of all you have to write a little Python open and so that you will loop over the city and do it yourself but it's much faster to do this using you in countries and 1 moment think about time and and conditions so that I conditions scandal in it is to work on multidimensional so if you want to get the mean value of the entire area and you do X . mean and you want the mean value of the and columns so over you pass the exercise argument there so you got to the end of your call and so on on so there is a lot of regulations available in number and then you should get from malaria and read them if you are going to do some large scale data

29:12

analysis and the whole thing about them is that all of them have the same call signatures so you can pass X is prompted to all of them also in in

29:27

quick summary right in Python is fast invited loops in particular slow and if you're looking over there are a large dataset and then the best that the best way to do this is to use a number of such and to try it some of these techniques and the very last little thing that I want show you is the the example of how it so it can be used to implement in a meaningful and the algorithm so we will be using k-means here and I believe all of you know this 100 and this is question so it's just a quick reminder

30:14

of how audiences on this boat you select key points and random and cluster centers and assigns objects to their closest cluster centers according to euclidean distance and then calculated as this centroids what the mean of all of the objects in each cluster and then you repeat steps 2 2 3 and 4 under here we just generate some things and synthetic data to work with the and here it

30:50

is and so the visualization these data and we have a bunch of bonds floating in

30:56

the space and we want a computer classes for each point here and basically what we're gonna do we're going to compute Euclidian distance and here we've got mechanized version of it the so here and just 5 lines of course and then carry this is a on giving them implemented aligned aligned like it was written before and so I had to look at this set of words that some definition and I just managed to translated to line-by-line so it can be achieved by by pure Python about a month might and this makes me really excited and here think so just out of

31:55

time and I'm going to leave you with this and if you are interested in this let's said you can go too much into account and I'll post link to slide and and they want to thank you for listening and I hope this was

32:10

helpful and this is judge the lines and the rest of conference well thank you

32:26

get number of no it's not focuses on how you have some no questions really this can of

32:48

the could have you ever comparative by performance despite by for example if any of you students refuses to use number by but you still need to check the assignment you can just run on by by the cellular as

33:06

well as on pipeline and the number that and so this is the friends and just in time comparison radius it doesn't of these and talk and sometimes it's phosphorus sometimes you know I pipeline so the idea is that there a lot of work to be done get we

33:42

should good the model is easy to relate to testimony the only universal function is the

33:57

perfectly easy and will highlight can on having set a human rights and also functional yourself and then to and is a worker like built 1 OK

34:16

thank intensity coming

00:00

Parametersystem

Zahlenbereich

Schlussregel

Computeranimation

Objekt <Kategorie>

Task

Metropolitan area network

Virtuelle Maschine

Algorithmus

Flächeninhalt

Kurvenanpassung

Softwareentwickler

Grundraum

Baum <Mathematik>

01:48

Inklusion <Mathematik>

Rechenzentrum

Metropolitan area network

Fundamentalsatz der Algebra

Menge

Total <Mathematik>

t-Test

Zahlenbereich

Physikalisches System

Ordnung <Mathematik>

Bitrate

Computeranimation

03:58

Lineares Funktional

Subtraktion

Punkt

Zahlenbereich

Iteration

Wärmeübergang

Computeranimation

Loop

Grundsätze ordnungsmäßiger Datenverarbeitung

Inhalt <Mathematik>

Abstand

Datenfluss

Benchmark

05:09

Resultante

Lineares Funktional

Loop

Total <Mathematik>

Formale Sprache

Abstand

Paarvergleich

Chatbot

Baum <Mathematik>

Computeranimation

05:49

Addition

Lineares Funktional

Dicke

Profil <Aerodynamik>

Fortsetzung <Mathematik>

Computer

Gerade

Ereignishorizont

Computeranimation

Task

Metropolitan area network

Rhombus <Mathematik>

Dienst <Informatik>

Multiplikation

Zellularer Automat

Mustersprache

Mereologie

Maßerweiterung

Gerade

07:24

Lucas-Zahlenreihe

Metropolitan area network

Subtraktion

Compiler

Datentyp

Formale Sprache

Algorithmische Programmiersprache

Gerade

Computeranimation

Funktion <Mathematik>

08:34

Soundverarbeitung

Arithmetisches Mittel

Virtuelle Maschine

Variable

Formale Sprache

Deklarative Programmiersprache

Interpretierer

Baum <Mathematik>

Computeranimation

09:00

Interpretierer

Bit

Diskretes System

Natürliche Zahl

Zwei

Snake <Bildverarbeitung>

Schreiben <Datenverarbeitung>

Zellularer Automat

Interpretierer

Physikalisches System

Teilbarkeit

Computeranimation

Datentyp

Mustersprache

Optimierung

Overhead <Kommunikationstechnik>

Baum <Mathematik>

10:29

Portscanner

Softwaretest

Formale Sprache

Zahlenbereich

Rechenzeit

Softwareentwickler

10:59

Magnetbandlaufwerk

Resultante

Autorisierung

Lineares Funktional

Nichtlinearer Operator

Subtraktion

Programmiergerät

Datentyp

Element <Mathematik>

Zahlenbereich

Element <Mathematik>

Biprodukt

Physikalische Theorie

Computeranimation

Formale Semantik

Multiplikation

Funktion <Mathematik>

Flächeninhalt

Zustand

Mustersprache

Datentyp

Programmbibliothek

Vorlesung/Konferenz

Optimierung

Baum <Mathematik>

13:48

Lineares Funktional

Radius

Dicke

Relationentheorie

Zahlenbereich

Zellularer Automat

Mailing-Liste

Element <Mathematik>

Code

Computeranimation

Loop

Einheit <Mathematik>

Loop

Datentyp

Baum <Mathematik>

15:35

Gleitkommarechnung

Lineares Funktional

Arithmetischer Ausdruck

Baum <Mathematik>

Computeranimation

16:06

Bit

Multiplikation

Program Slicing

Mathematisierung

Zahlenbereich

Division

Statistische Hypothese

Computeranimation

Formale Semantik

OISC

Vorlesung/Konferenz

Gerade

Metropolitan area network

Inklusion <Mathematik>

Gleitkommarechnung

Trennungsaxiom

Lineares Funktional

Sichtenkonzept

Relativitätstheorie

Indexberechnung

Mailing-Liste

Paarvergleich

Quick-Sort

Verdeckungsrechnung

Rechenschieber

Mapping <Computergraphik>

Codec

Flächeninhalt

Automatische Indexierung

Ganze Zahl

Festspeicher

Overhead <Kommunikationstechnik>

Baum <Mathematik>

Instantiierung

21:03

Resultante

Matrizenrechnung

Subtraktion

Wellenpaket

Hausdorff-Dimension

t-Test

Versionsverwaltung

Zahlenbereich

Broadcastingverfahren

Code

Computeranimation

Metropolitan area network

Softwaretest

Broadcastingverfahren

Gerade

Gammafunktion

Leistung <Physik>

Softwaretest

Nichtlinearer Operator

Linienelement

Schlussregel

Datenfluss

Quick-Sort

Verdeckungsrechnung

Druckverlauf

Flächeninhalt

Menge

Konditionszahl

Baum <Mathematik>

23:21

Resultante

Managementinformationssystem

Lineares Funktional

Zentrische Streckung

TVD-Verfahren

Dicke

Linienelement

Matching <Graphentheorie>

Hausdorff-Dimension

Abstraktionsebene

Zahlenbereich

Schlussregel

Computerunterstütztes Verfahren

Bitrate

Broadcastingverfahren

Computeranimation

Eins

Skeleton <Programmierung>

Hausdorff-Dimension

Festspeicher

Broadcastingverfahren

Fehlermeldung

26:26

Metropolitan area network

Dicke

Shape <Informatik>

Uniforme Struktur

Linienelement

Flächeninhalt

Mathematisierung

Regulärer Ausdruck

Baum <Mathematik>

Computeranimation

26:55

Lineares Funktional

Addition

Punkt

Festspeicher

t-Test

Zahlenbereich

Kartesische Koordinaten

Element <Mathematik>

Grundraum

Baum <Mathematik>

Computeranimation

27:56

Zentrische Streckung

Lineares Funktional

Parametersystem

Momentenproblem

Extrempunkt

Relativitätstheorie

Zahlenbereich

Systemaufruf

Extrempunkt

Elektronische Unterschrift

Computeranimation

Metropolitan area network

Loop

Flächeninhalt

Offene Menge

Konditionszahl

Vorlesung/Konferenz

Baum <Mathematik>

Große Vereinheitlichung

Regulator <Mathematik>

Analysis

29:24

Punkt

Vorzeichen <Mathematik>

Computeranimation

Objekt <Kategorie>

Arithmetisches Mittel

Algorithmus

Rechter Winkel

Code

Randomisierung

Punkt

Abstand

Broadcastingverfahren

Schwebung

Klumpenstichprobe

Schlüsselverwaltung

Ideal <Mathematik>

30:47

Metropolitan area network

Punkt

Menge

Klasse <Mathematik>

Versionsverwaltung

Visualisierung

Wort <Informatik>

Abstand

Computer

Ext-Funktor

Gerade

Raum-Zeit

Computeranimation

31:50

Binder <Informatik>

Baum <Mathematik>

Ranking

Gerade

Computeranimation

32:25

Roboter

t-Test

Red Hat

Zahlenbereich

Baum <Mathematik>

Computeranimation

33:05

Lineares Funktional

Informationsmodellierung

Zahlenbereich

Paarvergleich

Ranking

Baum <Mathematik>

Computeranimation

33:56

Roboter

Rechter Winkel

Red Hat

Ranking

Baum <Mathematik>

Computeranimation

### Metadaten

#### Formale Metadaten

Titel | NumPy: vectorize your brain |

Serientitel | EuroPython 2015 |

Teil | 119 |

Anzahl der Teile | 173 |

Autor | Tuzova, Ekaterina |

Lizenz |
CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben |

DOI | 10.5446/20104 |

Herausgeber | EuroPython |

Erscheinungsjahr | 2015 |

Sprache | Englisch |

Produktionsort | Bilbao, Euskadi, Spain |

#### Technische Metadaten

Dauer | 34:21 |

#### Inhaltliche Metadaten

Fachgebiet | Informatik |

Abstract | Ekaterina Tuzova - NumPy: vectorize your brain NumPy is the fundamental Python package for scientific computing. However, being efficient with NumPy might require slightly changing how you write Python code. I’m going to show you the basic idioms essential for fast numerical computations in Python with NumPy. We'll see why Python loops are slow and why vectorizing these operations with NumPy can often be good. Topics covered in this talk will be array creation, broadcasting, universal functions, aggregations, slicing and indexing. Even if you're not using NumPy you'll benefit from this talk. |

Schlagwörter |
EuroPython Conference EP 2015 EuroPython 2015 |