Logo TIB AV-Portal Logo TIB AV-Portal

NumPy: vectorize your brain

Video in TIB AV-Portal: NumPy: vectorize your brain

Formal Metadata

NumPy: vectorize your brain
Title of Series
Part Number
Number of Parts
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Release Date
Production Place
Bilbao, Euskadi, Spain

Content Metadata

Subject Area
Ekaterina Tuzova - NumPy: vectorize your brain NumPy is the fundamental Python package for scientific computing. However, being efficient with NumPy might require slightly changing how you write Python code. I’m going to show you the basic idioms essential for fast numerical computations in Python with NumPy. We'll see why Python loops are slow and why vectorizing these operations with NumPy can often be good. Topics covered in this talk will be array creation, broadcasting, universal functions, aggregations, slicing and indexing. Even if you're not using NumPy you'll benefit from this talk.
Keywords EuroPython Conference EP 2015 EuroPython 2015
area man curve algorithm Development machine argument rules number Computer animation Google boom Universal objects tasks
man time sets student total number inclusion Computer animation rates orders Data Centre systems fundamental
point functionality flow time Content transfer distances Benchmarks number Computer animation different iterations loop God
Robots comparison functionality Computer animation Google time boom distances total loop Results information
man addition multiple functionality services Sequel breadth lines time cellular lines part events computational Computer animation profiles Google pattern extent diamond tasks
man Types Computer animation different time Luc neural network functions lines procedure information compiler
means Computer animation Google interpretations boom machine effects variables declarative information
dynamic overhead factor cellular bits programs second Types Computer animation naturally interpretations Google boom interpretations pattern write Snake systems
runtime time Development testing scans number information
functionality State Semantics theoretical programs elements number product programme different Google operations Authorization data types area multiple Stream elements Types Computer animation functions boom pattern Results libraries
functionality code breadth time cellular unit The list number elements Types redundancy loops radius Computer animation Google boom loop
functionality Computer animation Google boom expression Floating-Point
Slides functionality overhead views Floating-Point indicators Semantics number hypotheses mathematics SICS memory Google slices integer multiple man area comparison relation mapping The list division bits lines instance several inclusion Indexable Computer animation masking case boom essence sort
flow code time sets student rules dimension training number powerful versions Broadcast different Google operations matrix testing Gamma conditions area man lines Broadcast Computer animation masking boom testing sort pressure metrics Results
functionality scale matchings breadth Managementinformationssystem ones rules dimension computational number Broadcast skeleton Broadcast Computer animation rates memory Dimension metrics errors variations Results Abstract
area man mathematics Regular Expressions uniformity Computer animation breadth Google boom shape metrics van
point addition functionality Computer animation memory Google boom Universal student applications number elements
area man functionality scale relation regulations time moment analysis maximal maximal gute open subsets argument system call number signatures loop Computer animation Google boom conditions
point clustering Beats algorithm randomization key point code distances sign means Broadcast Computer animation Right ideal objects
point man sets Ext lines distances computational versions words Computer animation visualization spacetime classes
Computer animation link Google time boom Rank lines
Robots Red Hat Computer animation Google boom student number
comparison functionality Computer animation Google time boom Rank model number
Robots Red Hat Computer animation Google boom Rank Right
well let me get status and my name is Catherine and and by chance developers and how many of you know about but is out of this talk is not about such chemicals so if you are interested in large found please and this is our rules will be really happy to see you later and what I'm going to be talking about it is that vector is in your brain is number and this is actually the lecture taken from 108 machine learning unconstitutional in and have academic university in 7 is work and how many of you are using number I I mean how many of you are using them right in your area development well I'm saying I said need I'll now that I will not tell you anything new today and as I already mentioned in this talk is that from mining machine learning curves and you might wonder why this talk was included in such costs in the 1st place and this is the simplest answer in here I still have my cross the this simple algorithm 1 can you can imagine and k nearest neighbors and this argument through my you probably familiar with it and its use in classification tasks and the idea is to assign the label which is most frequent among her k-nearest neighbors to be object and assignment for this lecture
was to increase algorithms and apply it to once the data set and that was actually in it's called I got in applied to the user assignment and no 1 of my students actually used number and about it was and then these cold works to our I mean I can just made for so long time to I checked the assignment so might system and me and I decided to include introduction of the kind of number lecture in the course under that is my motivation to speak about non-PPI at these
knowledge and not the the main tool used in all of science and then what I wanna do today is talk about how to use them efficiently and how to use it for a data center and it's in the relatively easy but you have to think about in some different ways to about your quality rating num pi and to in order to use it efficiently so I'm going to go through some ideas that may be helpful well unfortunately when it was preparing this talk I found that I didn't have enough time to make a proper introduction to IPython's I still on assume for now that you all from really features and but I will explain some features to used in this during the talk let's get back to as a Python and let's talk about Python performance but the just thinking person learns about Python that Python is fast and it's fast for developing and time things out but
unfortunately the 2nd the you know learn about content that I is and everybody that by the the slope but do you know why you so let's write a single function to accomplish Euclidian distance and and this is actually also taken from 1st assignment we need these Euclidian distance to calculated to the nearest neighbors and on the transfer of God number of iterations needed and then we just accumulated the distance of the difference between 2 points and then predominantly is of accumulated nothing and I'm gonna use in time magic function included in my 5 thank you and I don't
non book and it allows you to measure your content and to quickly get benchmarks for the simple functions like this and then at that time it
functional and at the end you caught a couple of times to make sure it has the best result and we if we use
standard and we call our Euclidian distance function we find that it executes 2 . 67 and MS Pearl and you might wonder is it has already is it's slow now let's look on something in comparison and the best way to compare it this is to compare this to by language time sold if we instead implemented this exact function functions C
and I just here about semantic extension from my back to Lord sequel directly into Python solute we can use the same amount of time it functions but it's pretty all summer and if you haven't checked and I think we have to do is to do it and in the diamond these services function in the event that the government companies in 28 microseconds so we see that C quality is a hundred times faster than 5 so I'm sorry it's true pattern in small for this kind of task and what is the problem with the spike in cough nothing special nothing difficult to is done here we just a glance through the array and in some simple addition and multiplication and so let's do the next step and and we wanna find bottlenecks so that we want to learn what part of our quantity so slow and I'll use line profile installed on my computer and length has this nice API magic coming and at any given shows us how many times the time you spend on each line of called and that
this is anything strange here well it might
be kind of treaty if you haven't seen in the urine output before about this tension think here is that spent 38 per cent over all the time on the line on will be so the question is why and to also this
question we have to go back and see differences between languages and procedural and other languages are compiled and statically typed languages so you right the quality you had a compiler and that times through quality and and decides how it's going to be executed and the the downside of it is that he the compiler needs to know and variable types and the compiler time that means you have to specify types yourself but actually I really love C and it was my 1st language but
it's far more cumbersome you have to add all of these extra stuff and mean and you have to remember to declare relevant variables and in sentence and but I think or act out on the other hand
are interpreted languages so they don't compare them to the effect machine quote which means it
executes a little bit slower by their and bandages as well and we all know that by has use a dynamic type system i which makes program and so and you don't have to specify the types yourself you don't have to write type annotations and my colleague Andreas is going be talking about and the
annotations and when it becomes useful that is so please visit his talk tomorrow and it's gonna be interesting and the cell vector the dynamic nature of Python it's them into Python duration and there is there is a little bit of overhead for thinks like a type checking and went into a lesbian factor and the interpreter has to check type of AD and then checked type of B and then find the proper court acute and then returns the and there is also reference counting inductor has to mental reference contour and then decrease seconds counter as a change of random cellular and not only like pattern because he sees unified said somewhat slower but very quick to do well to provide the quality and well that's why I used by so the
question is what we what do we do in this slide and
that's where number comes in not that is basically designed to help us get the best from both worlds and I want to have faster execution time from languages like C and we want to test development director from time so I'm going to talk to the and
here is some ideas through make Python faster when you're working the numerical data and the 1st thing I'm going to talk about these you differences and and it's the simplest opportunity
I just want the name for a universal function
and this is basically a special type of functions and defined it in and number the library and it generates heat element-wise and the Andean behind you can't see it is to combine the functionality and they will come together well in 1 so let me show you an example of this he shared Python programmers who doesn't use num pi and the you wanna do element-wise operations when you're on this is probably the best thing to do it and so we have a a over into uh when you send you want to add 1 to each of this variance and as it would by hand program and you probably miscomprehension so you do and natural way plus 1 for already in any and print out of the result so this is Python agreed to do it not only to to do this is to which is in the same way it is a great number during the there special attention functions and then here we adopt and that's what we want to the end of the year all of something like this let's say you do hear it is the you trace your areas it is just a number and not higher lots of plus operator and actually produces the result in so that was authorities in here we knew so this is a binary you function and uracil function commands slope and functionality so what it set here is that it really do a plus 1 is without them I wanna look through all of the elements of the theory and I want to add 1 to each of those and they have this sense thinking for multiplication and for them coverages and not the city is element-wise multiplication not just semantics product and we'll have a nice and explore Americans politics in patterns the don't find no no and then I will not is the difference here we don't we don't have any over here so and
then we have modified the school actually taken place in the internals the number and then and the question is why do we care about the so
let's take a look at and the speed of the plants 1st of all we trade large with a lot of radius and 2 % in time function means to time everything in the cell and he when the time creating innovative and aid 1 to each element of the area and from now wouldn't get the is 110 microseconds and if dual this same in pure Python we do this by hand and type in the correct and and then we'll look school the lengths of their way and then the add I want for each element of the array and again the got 100 hectares speed up and also I should point out that it's much more easier to type and understand this quality it's hard to get it wrong and then list comprehension and you might ask why Python when I'm so much faster so what is the magic that have what happens under the hood the unit of work here and what is uncertainty in the fact that many years Mumbai functions the loops are happening in compiled code so long time is it could be that it should be in the region redundancy in and you
have compiled functions for common durations of these common variations on so it you be actually access that this common durations in Python using the high-level expression and that's why it is so much and but doesn't to make sense the OK well it's it's the it's really
nice of these functions and their many functions
and it'd be looking into a number i and basically all arithmetical divisions the comparison of the 2 separations loaded from non-primary said to do this sort of Europe functions and and is there a bunch of that means a of seeing them and and number well and and the next thing we don't talk about it is the slicing and indexing and if you use to the least said impact on and you not that you can index in this you then an integer are to find a single value and you can also in needs to be in the slides to get multiple images and you can actually do absolutely the same man is the number of bits welcome the 1 interest and think about number slicing is that there is no memory overhead like unlikely in plain Python lists and I'm entirely redone suggests that you over there so sentence lessons to the new variable and you change only 1 value in that military and then this vector is changed in the initial areas so please be aware of in multidimensional arrays you can access and enhanced by all columns and all common cold so indexes and so if you pass it 0 come on line and we are asking that for 0 0 and column 1 and the very easy 1 and we can also use slicing ornament multidimensional arrays in the last example here we got the semantics and we can go further and combine slices and indexes together and here we are asking for a whole number 0 and for all columns and she is in exactly the same to them and 0 over x of 0 so and in a online actually offers them a lot of other fast and convenient place through do all sorts of indexing people to go in it's more complicated to areas that index more complicated chunks of data and 1 of those inexperienced and on the next this is just basically passing the list of things exist the area so if you want to as sentences zeroth and 1st sentiment over and we just put those in existing in this and the bias that these through the area index and I came up with their relatives and again we don't have to write here and over these indexes you just them all together at once and it's much weaker than to love them right in the and by the way we think about the thesis is that it doesn't return the view of very essence in the before being most cases is determined a copy of it right so and you have to be aware of this and you can see here that in this assignment didn't change the value of the initial area acts on like a solid B and that allows you to use boolean masks uh as an indexing so instead of passing integer to choose values from access you can pass these maps and it it'll construct the area you are interested in and so this might seem like well why would I need to think like the and that in minute becomes handing out when you combine this with a simple you functions you saw earlier and lose if you look at the last example on the side here we used to x is greater than 2 and how to construct wouldn't and then it just passes in this area to the area index and then of text lemonade from myself and using this technique mostly on data preparation steps
and for instance when we are looking at on the
area and then we want to speed dating to test and train sorts of the United States to do this decides in the European side by and just 10 speed is to create an and masculinity clearly the lens of the eye and applying this mask led to the area and apply these negative version of it to the and that's how these things being can achieved by my students so instead of writing that this will over the least and uh you know from the flow of freedom in the least if some condition is abandoned it to the result of it happens have automatically and it happens in 1 line of code and it is much much quicker than uh these Python by hand pressure and and next they didn't wanna talk about it is using the number time broadcasting so this is something very cool
about 1 and broadcasting in 1 of reasons thing that really makes a number of powerful and policy express very complicated to operations with reasoning and what broadcasting down speeding gives you a set of rules that are very she you find operates on the and areas of different sizes and dimensions so what this set of rules so almost you to do is to do things like for example and then introduce to you and Mary and well you can add role to their metrics so you can do even crazier things so you can add the to the column and it'll expands to the 2 dimensional matrix so the role of broadcasting is pretty simple but in some that's a little bit confusing and it takes a while to wrap your mind around to what's going on and but once you get this and you can do a huge amount of evidence and said that it was really
efficiently using these broadcasts so the 1st rule is that the variations shades differ left and the smaller scale the the once and then you compare the 2 dimensions and if any dimension doesn't match they do broadcast all kind of expanded and the dimensions in the size of equals to 1 and that that the dimensions and non-financial but neither of them is equal to 1 there is no way to together and you an error so this is a quick example of how it was we only saw adding a skeleton vector example we spoke about the you functions we did not bigger than that it was broadcasting and look this example that we have to make any metrics and we are adding the length of the vector so the 1st thing we do here and we do we have left that they had to be the ones to make the number of the dimensions much and then you brought that up and use trade show that picture of the whole metrics so then we have to 2 metal systematic and then they just add them together and we got the the results to buy the and we can think about it like an accordion memory at a constant rate to much dimensions but there is not should actually there no copying memory and this is just an abstraction to think about it so there is no memory of and then number 1 just x a this happening under the hood of so what this is in this I want you to do is and to do things like this and then writing the article said that wanted to erase and invite you can express this we use it as a broadcast in the text and then you get much faster version and much faster computations and also much cleaner so you don't have to worry about groups and that I should be eating here for the annotation but it works for any binary functions and more nice feature about non-payment not
you might have seen before and the have the by to Maddox here and what will happen if we add these 2 together according to broadcast and well
we got him when you're not because the our shapes
and the way to little and have the length area so there is no money and there's no way to my should those together and we can lift it and Arabia once by then we just can't expand this too much the metrics change and so
and here comes the and unplanned and what is best and ask you to and the there and that that's and new axis here and you can and cannot exercise where we want and it's a very useful anyone everywhere and until you raise some how to broadcast it in a way you wanted so what does it still make sense about because and my lectures in university most of my students were lost at this point but once again broadcast in in the doesn't and additional memory it doesn't actually allocates so the elastic the element of today is number aggregations and number
inter-agency that functions which summarizes the letters so there is some and that as an
example I have a new functions and none has and much of the and relations of the things like minimum maximum some so and again it's something that is if you're writing it out of all you have to write a little Python open and so that you will loop over the city and do it yourself but it's much faster to do this using you in countries and 1 moment think about time and and conditions so that I conditions scandal in it is to work on multidimensional so if you want to get the mean value of the entire area and you do X . mean and you want the mean value of the and columns so over you pass the exercise argument there so you got to the end of your call and so on on so there is a lot of regulations available in number and then you should get from malaria and read them if you are going to do some large scale data
analysis and the whole thing about them is that all of them have the same call signatures so you can pass X is prompted to all of them also in in
quick summary right in Python is fast invited loops in particular slow and if you're looking over there are a large dataset and then the best that the best way to do this is to use a number of such and to try it some of these techniques and the very last little thing that I want show you is the the example of how it so it can be used to implement in a meaningful and the algorithm so we will be using k-means here and I believe all of you know this 100 and this is question so it's just a quick reminder
of how audiences on this boat you select key points and random and cluster centers and assigns objects to their closest cluster centers according to euclidean distance and then calculated as this centroids what the mean of all of the objects in each cluster and then you repeat steps 2 2 3 and 4 under here we just generate some things and synthetic data to work with the and here it
is and so the visualization these data and we have a bunch of bonds floating in
the space and we want a computer classes for each point here and basically what we're gonna do we're going to compute Euclidian distance and here we've got mechanized version of it the so here and just 5 lines of course and then carry this is a on giving them implemented aligned aligned like it was written before and so I had to look at this set of words that some definition and I just managed to translated to line-by-line so it can be achieved by by pure Python about a month might and this makes me really excited and here think so just out of
time and I'm going to leave you with this and if you are interested in this let's said you can go too much into account and I'll post link to slide and and they want to thank you for listening and I hope this was
helpful and this is judge the lines and the rest of conference well thank you
get number of no it's not focuses on how you have some no questions really this can of
the could have you ever comparative by performance despite by for example if any of you students refuses to use number by but you still need to check the assignment you can just run on by by the cellular as
well as on pipeline and the number that and so this is the friends and just in time comparison radius it doesn't of these and talk and sometimes it's phosphorus sometimes you know I pipeline so the idea is that there a lot of work to be done get we
should good the model is easy to relate to testimony the only universal function is the
perfectly easy and will highlight can on having set a human rights and also functional yourself and then to and is a worker like built 1 OK
thank intensity coming