Add to Watchlist

NumPy: vectorize your brain

82 views

Citation of segment
Embed Code
Purchasing a DVD Cite video

Formal Metadata

Title NumPy: vectorize your brain
Title of Series EuroPython 2015
Part Number 119
Number of Parts 173
Author Tuzova, Ekaterina
License CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
DOI 10.5446/20104
Publisher EuroPython
Release Date 2015
Language English
Production Place Bilbao, Euskadi, Spain

Content Metadata

Subject Area Computer Science
Abstract Ekaterina Tuzova - NumPy: vectorize your brain NumPy is the fundamental Python package for scientific computing. However, being efficient with NumPy might require slightly changing how you write Python code. I’m going to show you the basic idioms essential for fast numerical computations in Python with NumPy. We'll see why Python loops are slow and why vectorizing these operations with NumPy can often be good. Topics covered in this talk will be array creation, broadcasting, universal functions, aggregations, slicing and indexing. Even if you're not using NumPy you'll benefit from this talk.
Keywords EuroPython Conference
EP 2015
EuroPython 2015
Series
Annotations
Transcript
Loading...
well let me get status and my name is Catherine and and by chance developers and how many of you know about but is out of this talk is not about such chemicals so if you are interested in large found please and this is our rules will be really happy to see you later and what I'm going to be talking about it is that vector is in your brain is number and this is actually the lecture taken from 108 machine learning unconstitutional in and have academic university in 7 is work and how many of you are using number I I mean how many of you are using them right in your area development well I'm saying I said need I'll now that I will not tell you anything new today and as I already mentioned in this talk is that from mining machine learning curves and you might wonder why this talk was included in such costs in the 1st place and this is the simplest answer in here I still have my cross the this simple algorithm 1 can you can imagine and k nearest neighbors and this argument through my you probably familiar with it and its use in classification tasks and the idea is to assign the label which is most frequent among her k-nearest neighbors to be object and assignment for this lecture
was to increase algorithms and apply it to once the data set and that was actually in it's called I got in applied to the user assignment and no 1 of my students actually used number and about it was and then these cold works to our I mean I can just made for so long time to I checked the assignment so might system and me and I decided to include introduction of the kind of number lecture in the course under that is my motivation to speak about non-PPI at these
knowledge and not the the main tool used in all of science and then what I wanna do today is talk about how to use them efficiently and how to use it for a data center and it's in the relatively easy but you have to think about in some different ways to about your quality rating num pi and to in order to use it efficiently so I'm going to go through some ideas that may be helpful well unfortunately when it was preparing this talk I found that I didn't have enough time to make a proper introduction to IPython's I still on assume for now that you all from really features and but I will explain some features to used in this during the talk let's get back to as a Python and let's talk about Python performance but the just thinking person learns about Python that Python is fast and it's fast for developing and time things out but
unfortunately the 2nd the you know learn about content that I is and everybody that by the the slope but do you know why you so let's write a single function to accomplish Euclidian distance and and this is actually also taken from 1st assignment we need these Euclidian distance to calculated to the nearest neighbors and on the transfer of God number of iterations needed and then we just accumulated the distance of the difference between 2 points and then predominantly is of accumulated nothing and I'm gonna use in time magic function included in my 5 thank you and I don't
non book and it allows you to measure your content and to quickly get benchmarks for the simple functions like this and then at that time it
functional and at the end you caught a couple of times to make sure it has the best result and we if we use
standard and we call our Euclidian distance function we find that it executes 2 . 67 and MS Pearl and you might wonder is it has already is it's slow now let's look on something in comparison and the best way to compare it this is to compare this to by language time sold if we instead implemented this exact function functions C
and I just here about semantic extension from my back to Lord sequel directly into Python solute we can use the same amount of time it functions but it's pretty all summer and if you haven't checked and I think we have to do is to do it and in the diamond these services function in the event that the government companies in 28 microseconds so we see that C quality is a hundred times faster than 5 so I'm sorry it's true pattern in small for this kind of task and what is the problem with the spike in cough nothing special nothing difficult to is done here we just a glance through the array and in some simple addition and multiplication and so let's do the next step and and we wanna find bottlenecks so that we want to learn what part of our quantity so slow and I'll use line profile installed on my computer and length has this nice API magic coming and at any given shows us how many times the time you spend on each line of called and that
this is anything strange here well it might
be kind of treaty if you haven't seen in the urine output before about this tension think here is that spent 38 per cent over all the time on the line on will be so the question is why and to also this
question we have to go back and see differences between languages and procedural and other languages are compiled and statically typed languages so you right the quality you had a compiler and that times through quality and and decides how it's going to be executed and the the downside of it is that he the compiler needs to know and variable types and the compiler time that means you have to specify types yourself but actually I really love C and it was my 1st language but
it's far more cumbersome you have to add all of these extra stuff and mean and you have to remember to declare relevant variables and in sentence and but I think or act out on the other hand
are interpreted languages so they don't compare them to the effect machine quote which means it
executes a little bit slower by their and bandages as well and we all know that by has use a dynamic type system i which makes program and so and you don't have to specify the types yourself you don't have to write type annotations and my colleague Andreas is going be talking about and the
annotations and when it becomes useful that is so please visit his talk tomorrow and it's gonna be interesting and the cell vector the dynamic nature of Python it's them into Python duration and there is there is a little bit of overhead for thinks like a type checking and went into a lesbian factor and the interpreter has to check type of AD and then checked type of B and then find the proper court acute and then returns the and there is also reference counting inductor has to mental reference contour and then decrease seconds counter as a change of random cellular and not only like pattern because he sees unified said somewhat slower but very quick to do well to provide the quality and well that's why I used by so the
question is what we what do we do in this slide and
that's where number comes in not that is basically designed to help us get the best from both worlds and I want to have faster execution time from languages like C and we want to test development director from time so I'm going to talk to the and
here is some ideas through make Python faster when you're working the numerical data and the 1st thing I'm going to talk about these you differences and and it's the simplest opportunity
I just want the name for a universal function
and this is basically a special type of functions and defined it in and number the library and it generates heat element-wise and the Andean behind you can't see it is to combine the functionality and they will come together well in 1 so let me show you an example of this he shared Python programmers who doesn't use num pi and the you wanna do element-wise operations when you're on this is probably the best thing to do it and so we have a a over into uh when you send you want to add 1 to each of this variance and as it would by hand program and you probably miscomprehension so you do and natural way plus 1 for already in any and print out of the result so this is Python agreed to do it not only to to do this is to which is in the same way it is a great number during the there special attention functions and then here we adopt and that's what we want to the end of the year all of something like this let's say you do hear it is the you trace your areas it is just a number and not higher lots of plus operator and actually produces the result in so that was authorities in here we knew so this is a binary you function and uracil function commands slope and functionality so what it set here is that it really do a plus 1 is without them I wanna look through all of the elements of the theory and I want to add 1 to each of those and they have this sense thinking for multiplication and for them coverages and not the city is element-wise multiplication not just semantics product and we'll have a nice and explore Americans politics in patterns the don't find no no and then I will not is the difference here we don't we don't have any over here so and
then we have modified the school actually taken place in the internals the number and then and the question is why do we care about the so
let's take a look at and the speed of the plants 1st of all we trade large with a lot of radius and 2 % in time function means to time everything in the cell and he when the time creating innovative and aid 1 to each element of the area and from now wouldn't get the is 110 microseconds and if dual this same in pure Python we do this by hand and type in the correct and and then we'll look school the lengths of their way and then the add I want for each element of the array and again the got 100 hectares speed up and also I should point out that it's much more easier to type and understand this quality it's hard to get it wrong and then list comprehension and you might ask why Python when I'm so much faster so what is the magic that have what happens under the hood the unit of work here and what is uncertainty in the fact that many years Mumbai functions the loops are happening in compiled code so long time is it could be that it should be in the region redundancy in and you
have compiled functions for common durations of these common variations on so it you be actually access that this common durations in Python using the high-level expression and that's why it is so much and but doesn't to make sense the OK well it's it's the it's really
nice of these functions and their many functions
and it'd be looking into a number i and basically all arithmetical divisions the comparison of the 2 separations loaded from non-primary said to do this sort of Europe functions and and is there a bunch of that means a of seeing them and and number well and and the next thing we don't talk about it is the slicing and indexing and if you use to the least said impact on and you not that you can index in this you then an integer are to find a single value and you can also in needs to be in the slides to get multiple images and you can actually do absolutely the same man is the number of bits welcome the 1 interest and think about number slicing is that there is no memory overhead like unlikely in plain Python lists and I'm entirely redone suggests that you over there so sentence lessons to the new variable and you change only 1 value in that military and then this vector is changed in the initial areas so please be aware of in multidimensional arrays you can access and enhanced by all columns and all common cold so indexes and so if you pass it 0 come on line and we are asking that for 0 0 and column 1 and the very easy 1 and we can also use slicing ornament multidimensional arrays in the last example here we got the semantics and we can go further and combine slices and indexes together and here we are asking for a whole number 0 and for all columns and she is in exactly the same to them and 0 over x of 0 so and in a online actually offers them a lot of other fast and convenient place through do all sorts of indexing people to go in it's more complicated to areas that index more complicated chunks of data and 1 of those inexperienced and on the next this is just basically passing the list of things exist the area so if you want to as sentences zeroth and 1st sentiment over and we just put those in existing in this and the bias that these through the area index and I came up with their relatives and again we don't have to write here and over these indexes you just them all together at once and it's much weaker than to love them right in the and by the way we think about the thesis is that it doesn't return the view of very essence in the before being most cases is determined a copy of it right so and you have to be aware of this and you can see here that in this assignment didn't change the value of the initial area acts on like a solid B and that allows you to use boolean masks uh as an indexing so instead of passing integer to choose values from access you can pass these maps and it it'll construct the area you are interested in and so this might seem like well why would I need to think like the and that in minute becomes handing out when you combine this with a simple you functions you saw earlier and lose if you look at the last example on the side here we used to x is greater than 2 and how to construct wouldn't and then it just passes in this area to the area index and then of text lemonade from myself and using this technique mostly on data preparation steps
and for instance when we are looking at on the
area and then we want to speed dating to test and train sorts of the United States to do this decides in the European side by and just 10 speed is to create an and masculinity clearly the lens of the eye and applying this mask led to the area and apply these negative version of it to the and that's how these things being can achieved by my students so instead of writing that this will over the least and uh you know from the flow of freedom in the least if some condition is abandoned it to the result of it happens have automatically and it happens in 1 line of code and it is much much quicker than uh these Python by hand pressure and and next they didn't wanna talk about it is using the number time broadcasting so this is something very cool
about 1 and broadcasting in 1 of reasons thing that really makes a number of powerful and policy express very complicated to operations with reasoning and what broadcasting down speeding gives you a set of rules that are very she you find operates on the and areas of different sizes and dimensions so what this set of rules so almost you to do is to do things like for example and then introduce to you and Mary and well you can add role to their metrics so you can do even crazier things so you can add the to the column and it'll expands to the 2 dimensional matrix so the role of broadcasting is pretty simple but in some that's a little bit confusing and it takes a while to wrap your mind around to what's going on and but once you get this and you can do a huge amount of evidence and said that it was really
efficiently using these broadcasts so the 1st rule is that the variations shades differ left and the smaller scale the the once and then you compare the 2 dimensions and if any dimension doesn't match they do broadcast all kind of expanded and the dimensions in the size of equals to 1 and that that the dimensions and non-financial but neither of them is equal to 1 there is no way to together and you an error so this is a quick example of how it was we only saw adding a skeleton vector example we spoke about the you functions we did not bigger than that it was broadcasting and look this example that we have to make any metrics and we are adding the length of the vector so the 1st thing we do here and we do we have left that they had to be the ones to make the number of the dimensions much and then you brought that up and use trade show that picture of the whole metrics so then we have to 2 metal systematic and then they just add them together and we got the the results to buy the and we can think about it like an accordion memory at a constant rate to much dimensions but there is not should actually there no copying memory and this is just an abstraction to think about it so there is no memory of and then number 1 just x a this happening under the hood of so what this is in this I want you to do is and to do things like this and then writing the article said that wanted to erase and invite you can express this we use it as a broadcast in the text and then you get much faster version and much faster computations and also much cleaner so you don't have to worry about groups and that I should be eating here for the annotation but it works for any binary functions and more nice feature about non-payment not
you might have seen before and the have the by to Maddox here and what will happen if we add these 2 together according to broadcast and well
we got him when you're not because the our shapes
and the way to little and have the length area so there is no money and there's no way to my should those together and we can lift it and Arabia once by then we just can't expand this too much the metrics change and so
and here comes the and unplanned and what is best and ask you to and the there and that that's and new axis here and you can and cannot exercise where we want and it's a very useful anyone everywhere and until you raise some how to broadcast it in a way you wanted so what does it still make sense about because and my lectures in university most of my students were lost at this point but once again broadcast in in the doesn't and additional memory it doesn't actually allocates so the elastic the element of today is number aggregations and number
inter-agency that functions which summarizes the letters so there is some and that as an
example I have a new functions and none has and much of the and relations of the things like minimum maximum some so and again it's something that is if you're writing it out of all you have to write a little Python open and so that you will loop over the city and do it yourself but it's much faster to do this using you in countries and 1 moment think about time and and conditions so that I conditions scandal in it is to work on multidimensional so if you want to get the mean value of the entire area and you do X . mean and you want the mean value of the and columns so over you pass the exercise argument there so you got to the end of your call and so on on so there is a lot of regulations available in number and then you should get from malaria and read them if you are going to do some large scale data
analysis and the whole thing about them is that all of them have the same call signatures so you can pass X is prompted to all of them also in in
quick summary right in Python is fast invited loops in particular slow and if you're looking over there are a large dataset and then the best that the best way to do this is to use a number of such and to try it some of these techniques and the very last little thing that I want show you is the the example of how it so it can be used to implement in a meaningful and the algorithm so we will be using k-means here and I believe all of you know this 100 and this is question so it's just a quick reminder
of how audiences on this boat you select key points and random and cluster centers and assigns objects to their closest cluster centers according to euclidean distance and then calculated as this centroids what the mean of all of the objects in each cluster and then you repeat steps 2 2 3 and 4 under here we just generate some things and synthetic data to work with the and here it
is and so the visualization these data and we have a bunch of bonds floating in
the space and we want a computer classes for each point here and basically what we're gonna do we're going to compute Euclidian distance and here we've got mechanized version of it the so here and just 5 lines of course and then carry this is a on giving them implemented aligned aligned like it was written before and so I had to look at this set of words that some definition and I just managed to translated to line-by-line so it can be achieved by by pure Python about a month might and this makes me really excited and here think so just out of
time and I'm going to leave you with this and if you are interested in this let's said you can go too much into account and I'll post link to slide and and they want to thank you for listening and I hope this was
helpful and this is judge the lines and the rest of conference well thank you
get number of no it's not focuses on how you have some no questions really this can of
the could have you ever comparative by performance despite by for example if any of you students refuses to use number by but you still need to check the assignment you can just run on by by the cellular as
well as on pipeline and the number that and so this is the friends and just in time comparison radius it doesn't of these and talk and sometimes it's phosphorus sometimes you know I pipeline so the idea is that there a lot of work to be done get we
should good the model is easy to relate to testimony the only universal function is the
perfectly easy and will highlight can on having set a human rights and also functional yourself and then to and is a worker like built 1 OK
thank intensity coming
Area
Metropolitan area network
Curve
Algorithm
Software developer
Virtual machine
Parameter (computer programming)
Rule of inference
Number
Computer animation
Vector space
Boom (sailing)
Universe (mathematics)
Object (grammar)
Task (computing)
Metropolitan area network
Multiplication sign
Set (mathematics)
Student's t-test
Total S.A.
Number
Inclusion map
Computer animation
Bit rate
Order (biology)
Data center
Physical system
Fundamental theorem of algebra
Point (geometry)
Dataflow
Multiplication sign
Content (media)
Heat transfer
Distance
Functional (mathematics)
Benchmark
Number
Computer animation
Iteration
Subtraction
Loop (music)
God
Chatterbot
Pairwise comparison
Computer animation
Multiplication sign
Boom (sailing)
Distance
Total S.A.
Functional (mathematics)
Loop (music)
Resultant
Formal language
Metropolitan area network
Addition
Multiplication
Service (economics)
Sequel
Length
Line (geometry)
Multiplication sign
Cellular automaton
Line (geometry)
Mereology
Functional (mathematics)
Computer
Event horizon
Computer animation
Profil (magazine)
Pattern language
Extension (kinesiology)
Rhombus
Task (computing)
Metropolitan area network
Computer animation
Multiplication sign
Lucas sequence
Line (geometry)
Function (mathematics)
Procedural programming
Data type
Subtraction
Formal language
Compiler
Arithmetic mean
Computer animation
Interpreter (computing)
Boom (sailing)
Virtual machine
Sound effect
Variable (mathematics)
Declarative programming
Formal language
Computer programming
Overhead (computing)
Divisor
Cellular automaton
Dynamical system
Bit
2 (number)
Computer animation
Vector space
Natural number
Interpreter (computing)
Boom (sailing)
Interpreter (computing)
Pattern language
Data type
Writing
Active contour model
Physical system
Run time (program lifecycle phase)
Software developer
Multiplication sign
Software testing
Port scanner
Number
Formal language
Computer programming
Semantics (computer science)
Theory
Number
Programmer (hardware)
Operator (mathematics)
Authorization
Subtraction
Data type
Area
Multiplication
Product (category theory)
Tape drive
Element (mathematics)
Functional (mathematics)
Quantum state
Array data structure
Computer animation
Function (mathematics)
Boom (sailing)
Element (mathematics)
Pattern language
Data type
Resultant
Library (computing)
Code
Length
Multiplication sign
Cellular automaton
Element (mathematics)
Electronic mailing list
Functional (mathematics)
Number
Database normalization
Loop (music)
Radius
Computer animation
Boom (sailing)
Data type
Loop (music)
Units of measurement
Computer animation
Boom (sailing)
Expression
Floating point
Functional (mathematics)
Slide rule
Read-only memory
Overhead (computing)
View (database)
Floating point
Price index
Semantics (computer science)
Number
Hypothesis
Array data structure
Sic
Mathematics
Program slicing
Integer
Multiplication
Metropolitan area network
Area
Pairwise comparison
Theory of relativity
Mapping
Electronic mailing list
Division (mathematics)
Bit
Line (geometry)
Instance (computer science)
Functional (mathematics)
Inclusion map
Subject indexing
Array data structure
Computer animation
Vector space
Auditory masking
Personal digital assistant
Boom (sailing)
Codec
Quicksort
Separation axiom
Dataflow
Code
Multiplication sign
Student's t-test
Rule of inference
Number
Wave packet
Power (physics)
Revision control
Broadcasting (networking)
Matrix (mathematics)
Operator (mathematics)
Software testing
Gamma function
Subtraction
Condition number
Area
Metropolitan area network
Line (geometry)
Set (mathematics)
Metric tensor
Broadcasting (networking)
Computer animation
Auditory masking
Hausdorff dimension
Boom (sailing)
Software testing
Quicksort
Pressure
Resultant
Read-only memory
Scaling (geometry)
Length
Computer
Executive information system
1 (number)
Functional (mathematics)
Rule of inference
Number
Metric tensor
Broadcasting (networking)
Skeleton (computer programming)
Array data structure
Broadcasting (networking)
Computer animation
Vector space
Bit rate
Hausdorff dimension
Hausdorff dimension
Bounded variation
Error message
Matching (graph theory)
Resultant
Abstraction
Area
Metropolitan area network
Mathematics
Computer animation
Length
Boom (sailing)
Regular expression
Shape (magazine)
Uniform space
Metric tensor
Point (geometry)
Addition
Read-only memory
Computer animation
Boom (sailing)
Universe (mathematics)
Element (mathematics)
Student's t-test
Cartesian coordinate system
Functional (mathematics)
Number
Area
Metropolitan area network
Scaling (geometry)
Theory of relativity
Regulator gene
Multiplication sign
Moment (mathematics)
Mathematical analysis
Grand Unified Theory
Parameter (computer programming)
Functional (mathematics)
System call
Open set
Number
Electronic signature
Maxima and minima
Maxima and minima
Loop (music)
Computer animation
Boom (sailing)
Condition number
Point (geometry)
Cluster sampling
Beat (acoustics)
Algorithm
Randomization
Key (cryptography)
Point (geometry)
Code
Distance
Sign (mathematics)
Arithmetic mean
Broadcasting (networking)
Computer animation
Right angle
Ideal (ethics)
Object (grammar)
Point (geometry)
Metropolitan area network
Spacetime
Ext functor
Line (geometry)
Set (mathematics)
Distance
Computer
Revision control
Word
Computer animation
Visualization (computer graphics)
Social class
Computer animation
Linker (computing)
Multiplication sign
Boom (sailing)
Ranking
Line (geometry)
Robot
Red Hat
Computer animation
Boom (sailing)
Student's t-test
Number
Pairwise comparison
Computer animation
Multiplication sign
Scientific modelling
Boom (sailing)
Ranking
Functional (mathematics)
Number
Robot
Red Hat
Computer animation
Boom (sailing)
Ranking
Right angle
Loading...
Feedback

Timings

  548 ms - page object

Version

AV-Portal 3.8.0 (dec2fe8b0ce2e718d55d6f23ab68f0b2424a1f3f)