Add to Watchlist

Numba, a JIT compiler for fast numerical code


Citation of segment
Embed Code
Purchasing a DVD Cite video

Formal Metadata

Title Numba, a JIT compiler for fast numerical code
Title of Series EuroPython 2015
Part Number 62
Number of Parts 173
Author Pitrou, Antoine
License CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
DOI 10.5446/20081
Publisher EuroPython
Release Date 2015
Language English
Production Place Bilbao, Euskadi, Spain

Content Metadata

Subject Area Computer Science
Abstract Antoine Pitrou - Numba, a JIT compiler for fast numerical code This talk will be a general introduction to Numba. Numba is an open source just­-in-­time Python compiler that allows you to speed up numerical algorithms for which fast linear algebra (i.e. Numpy array operations) is not enough. It has backends for the CPU and for NVidia GPUs. After the talk, the audience should be able to understand for which use cases Numba is adequate, what level of performance to expect, and have a general notion of its inner working. A bit of familiarity with scientific computing and/or Numpy is recommended for optimal understanding, but the talk should otherwise be accessible to the average Python programmer. It should also be of interest to people who are curious about attempts at high-­performance Python.
Keywords EuroPython Conference
EP 2015
EuroPython 2015
to add the number that did greater on it and it's only optimizes that function all those functions if you have many of them it's so the nice thing with that is that it asked to react to semantics that means we not bound to to use your set semantics we can choose it and which is actually quite a bit in order to optimize the code but or so your height of the cold around that's all your classes and that classes and so on they can still use all kinds of complicated things that's number doesn't support but it's not a problem because since their values if they could in regular Python environments and number doesn't care so as I said specialized 1 right now it's specialize the number crunching crunching it's really have state stated from them by race that is known by the rate is dominant datatype in scientific computing it has a lot of features and we try to support the and a bunch of things so whereas story trying to extend the range of things that we support that's data right now it's Morris specialize number-crunching so the main target is the CPU we officially support fixated 6 than expected that fixators 64 Nigeria OVM provides us with support for me over architectures and we also have a target for NVIDIA GPU using so this means you write Python code and can indicated on the GPU but we were limited feature set because there's some there are some additions on what you can do on GPU of course there isn't a real run time you you would be able to the memory of location that you will be quite slow and the allocated memory we want know in the DQ use global memory which is not very fast we also have potential support for all architectures face to have the M so 1 of my colleagues tried a number on the Raspberry Pi and it actually works but we don't support it officially I think LVM takes several hours to combine we have some support going on for a test set which is something they aim the means heterogeneous system architecture it's an architecture for what they call was so the goal is to blend the programming model between should use and seduce you right so 1 1 implementation and in can run on either side segmented simultaneously on the GPU city you all you you can run it on either of them and supposedly residents memory sharing and so on this talk a bit about the architecture so number so if you compare it to other jets is quite straightforward it's not very exciting it's works 1 function at a time which is a it's a constraint we which we're gonna it relaxed because we need to another to support recursion work right now it's 1 function at the time it starts from the Python bytecode so we don't have a so we just use the by coordinated by suppression and we have a combination and analysis chain which transform transforms it slowly because various steps to end of the important part so as he and I are it's deviance internal representation it's a kind of uh let's say support of our assembly and follows you to specify a lot of things but the differences we see for example is eventually conspiracy fire some behaviors in a very near granary for example we can specify if there's some IFs signed overflow on integers is where was defined on different if you have a and in this example if you have undefined behavior on same assignment against overflow Benito's LVM to optimizations so after via the R is said to have the and everything is that again delegated to have unit and in itself included in all of our optimizations and also in acute in the function and on top of that you also generate some Python facing wrappers because we each function gets the role of an implementation which takes some these effects and you have to marshal those from and to buy some of the so this is the combination pipeline you there are 2 entry points that you can see the way the arose so the 1st entry point is the Python by good itself the center we have an analysis chain from the bytecode the 1st is byte got is analyzed and uh we build the control flow graph data from graph and we produce something which is called number i are so numbers of intermediate representation which is quite as high variance bytecode but that's a bit different it's it's not a stack machine it's based on values I mean the central point is when you have a function is actually called when the function is actually called we recall the types of values and we do type inference over that was values we try to propagate over types across a and I'm going talk about the number tags just after it so much welcome it's a bit more complicated than just nothing some classes to some types because it would have more granular type in a number of in by after the type inference pass a the past which deals with rewriting the I. R. Titze an optional but we just some optimizations the next bosses lowering and lowering is and of the m it's from the I've jargon it it means that you take a high-level language which is numbers by R and euro rates to something very low level which in this case is that of the amount of and then we should everything to have the and to be of the object which produces mission code and we did you did so there is so small a small rectangle in in the cache which is great out because it's not implemented yet but I do we will be able to catch emission code all the of the empire in order to have faster completion times so number types of instead of another type system is more granular and more precise than than the Python type system and we have several digitized based on their depending on the thickness on the sinus we have a single-precision double-precision floating-point types we have multiple subtypes which means that you don't have a single or double died you have a different appetite for every year for every kind of parameter that's and that of course doubles are tied to based on the year determines number type so you have different type a type for example for a pair of agents and for its food 64 for a pair of 64 32 and so on and then binary themselves so they are very important I part
of relevant of a number of number and and scientific computing of your typed according to dimensionality and to their country was and so the roaring parts is what's really it takes type infinite number I R and it's the transforms it into of the encoding of the and I are so this is a very straightforward and never exciting part but it has a lot of code because we implement other functions we implement all the operators we implement not functions and so on and
if we are just enough we what we generate 2 can grow out of the M 9 and 2 of optimizations here so what supported and number supports a real grows more more
of a small subset of life images on the syntax fields and that's a text fronted supports quite a bit but not all of it supports or control from retains all constructs it supports raise exceptions but not catch from it supports quoting overcome by functions so we have recent support for generators but only the simple kind of generators but is not but not those to which you can send values not proteins but just to syntactic the iterators with gives keyword so what don't we support we don't support well over rests we don't support the exception catching girl we don't support context made us we don't supports comprehension and actually we don't support list sets and the extent of which a certain income and we don't support here for and as for the built-in types and functions we we have support for most dives which are useful for conduit for scientific computing so over numeric types of integers floats and so on typos unknown which
are quite basic and have support for buffer protocol which means we can you can address you can index all the bytes resident reviews and everything which supports about of above a protocol which also includes for example memory-mapped files using the and that module we have support for a bunch of latent functions and you has support for most operators but of course only on the types that we support so while the numerator types we we are able
to optimize a civil of the standard of neighboring modules so mostly those which are specialized for numeric introduces often must of course we have support for the random number generation we actually use the the same algorithm of MSC by friends so the most interesting and said that we have a a separate state we have support for C dies which means you can call rosy functions from number called which which is a cheap way of act recording of C libraries and it's generates very fast growing because it cause it from the native context similarly with support i which is just a replacement for the types was there a time and we support mostly non by the but histologic subset of number so it's what we support a number is really the objective of preventing the conditions are not really and the preventing power and doing the
times and again but about it here and we support and most kinds of erasing so most dimension it is from 0 to 2 and the which supports arrays of various desired source gate arrays of numbers and so on structured arrays and we support arrays with subarrays and the only thing we don't support and we will support in on time I think is always contained Python objects because the whole point of non used to generate native code which doesn't go from received by the media so we have recently added support for constructors so we can do memory allocation Barakat memory from functions various operations on race such as iterates indexing slicing summary of various kinds of iterators with support such as the Dutch that operates more extensive ones we have support for reductions so of the products so native sons and so on under those stereotypes France was supported 64 164 which our we I think little known types which allow you to do role of computations on Sunday Times intended to and was supported and that number you find them in the same way that we support the random module so
limitations apart from what we don't support in terms of syntax in terms of syntax in terms of of types we don't support recursion so that's because we're comparing 1 function at a time and we have to work every 2 to change that we can't come by classes and again that's because we compare 1 function at time so we don't have a way of specifying a the structure and still several methods operating on on on a user-defined types and the other annotations that type B type inference is is really has to succeed so the the type of inference policies to infer a type for a given variable than the whole combination affairs idea you will we would have a way to say well this is a python object but the rest is to influence the way we will be able to reject but right now it is not possible and actually 1 type inference face it goes into a mold good object mode which is not very interesting as far as performance is concerned so as I said with that opting it so it allows us to react the semantics so of you have understood perhaps it has 16 fixed-size into goes up to 64 bits so from which have addition of between genders and the result frozen and you just senior or truncated result you don't have an overflow room or anything we we take the liberty of freezing legal born out of variables so we are considering constants which is which makes it much easier to to come by and it allows us to have resources to generate modernized code for example if you have must not by usually not that I want change so it's only fair to consider it a constant but of course in the new module have a global variable whose value changes year you won't see it in a complex function intuitively value so we don't have any frame introspection basically we don't have any given features right now neither from the civil on from the Python level so this is something which adjusted the seat of Oregon it's going to work on it because we want to we want to expose the names of divergent functions to the answer that you can find some DVD and have the nice phrase that so how to
use it so basically the the main languages it is to use the decorator it's very simple so you have a function and you just non-technical right on it and hopefully it will be able to combine it so the default way is not to pass the argument to reject the greater and with it we're lazy colonies combined function this means that it we wait for a function to be called and it would do a type inference thing at this point and will be a generator via native code and since you're going the function which cover cut on the fly and there's another way to which is to manually specialized arguments but let's say you you really know you have some you want some footage bits inside you want some double precision approach also single-precision floats and so you have you you are able to pass an explicit signature jet but this is not the recommended it's mostly for us to test so as an option to remove a deal in which is quite easy for us since we are not calling it and again from various generating native code so just passed no dairy cost you and the gear will be removed so that there is a global interpreter close we don't know each so it's a lot which constraints you buy from institution to a single if if you remove the gear you can your functional your functions from several friends and have the power executions course but of course you have no protection from these conditions so you are in the same position of this year's a C + + programmers who has to work be careful about not having several threads success in the same data on mutated for example as its if instead of having your own freight corridors she was to concurrent at food futures on patient free another feature is that vectorized the greatest number has something called a universal function to which language a
universal function is on it's better take an example so if you take the process operator between arrays for example which is a short cut to the and yet at a function the and other at
function is basically doing an element-wise operations on board in the elements of its inputs and weights implemented is to have a loop on the element-wise operations in January the of and the nice thing with variables or function is that you have several additional features the something convert broadcasting In the empire so if you adding for example a scanner and a a tree the scale which we will be added to each element in the array so lever the lower-dimensional argument is but that's what were tested on 2 a high-dimensional arguments so this is handled automatically by the you func framework and the year of the interim doesn't have to to care about that and it also give you for free some reduction methods so you have some we use an activation functions so not by comes with a fixed set of uniform so we universal functions and modification square root and so on and traditionally if you want to add a universal function right from only you have to go and see you write your European city specific C API provided by non you company to get the right number version and you get your universal function so it's not it's very convenient for users and users belong that server using number can right to the element-wise function Q apart from and you can put vectorized agreed on it and it's would generate you from another yet another more sophisticated feature of known by the general rise universal function so this is an extension of the idea of a universal function a universal function works on 1 element at a time in doesn't see the neighbors all the rest of your race generalizing muscle function can see the whole reason you have to specify and that is exactly what the layout of the input so it's it's Rose for some more sophisticated functions such as a moving average so num number so follows you to generate generally generalized universal function using Virgil vectorize the greater so here is an
example of it's called the ising model so it's something which is used about the mainly for benchmarking but it's inspired from some physics models the the
basic idea is that you have a 2 D grid two-dimensional rate of Boolean states on the
I the body and being estates and you can think of it as it shouldn't have had an idea of value of plus 1 or minus 1 and starts from a random state physically and at each iteration you make each
element very based upon its neighbors so at the end it's supposed to converge toward some something which is quite stable so
this was generated with number this animation so if you
look at how it looks like well you
have learned in a function which processes each of its elements duration and which updates its based on its neighbors values are broke up or of operations it takes its neighbor's value and combined it with the actual value of the elements it takes a decision based on that and the random random number and the outer loop is just a grouping of a whole range and it's updated its objects or elements sir the outer loop which we've seen each 1 frame does 1 iteration and then if you want to major model converge have to correct a number of times so if you
measure that's where you get to the
100 times speed up from number from which is less than than you get we Fortran but still it's we've been rendered so in this case it's twice twice slower and we know why actually because of array indexing Python is more sophisticated for example or 1 of the main reason is that if you python arose negative or indexing you that if you have a negative index your index and from the and so you have to have a runtime check of the
negativeness of each number and in some cases of the isn't were able to optimize it out so
besides that we have to support as I said so the main idea for that is that you don't decoration it sounds so we don't try to hide it should a programming model of the CUDA programming model is based on the
notion of a grid of France so you have blocks of threads and you have a grid of blocks and GPU executes almost threads in part more but you have to double GP which is the topology of friends the Moon and besides that there are 2 types of functions they are our kernel functions which are called from articulatory circle function is not able to return a value you posted some erase some input sample arrays which are marshaled automatically by number to the GPU and you right results from interview and there's something called device functions which are really some functions and the output from the GPU to the GPU so these ones can return values when
using the CUDA support a number you have injury features because as I said the yeah uh you you don't have a large run available on the GPU so it's also requires a program that has not only someone some knowledge of curious and how you Q works but also to have some intuition of hard to optimize the code for addiction GPU because it's not usually you know you're not using the arranging in your of you're a dollar them in the same way in which you on the CPU especially except in 2 cases so here is an example
it's a reason for example this one's just to show you how it works and we're trying to compute the cosine of an array so we use using the you greater we have a function which takes 2 arguments the 1st argument is that the input during an argument is the output away so it's virginal convention is just a choice here for example we lost the idea is that each give you Fred which compute we compute 1 value of the array so it would take 1 element in array and compute the cosine and put it in the in the output array so that's the 1st thing is that you are computing the index circuit to counter the index of a conference you go over to the would functions and then you have to our just call must that cost on the input and write it in the upper so this is the definition but you want to go it you have to the so did he recalls defines the interview function when you want to instantiate the the kernel actually and is instantiated the kernel means that you're going to define the grid topology so this is French conflict here it's a put up or and between and the 1st element is the number of rocks in the great I think and the 2nd number is the number of friends in the in each block so you define the topology based on the land from the output and ecology you cost function with the topology and the on the input and output so while in this example the numbers of it on the GPU but it's not very important because you won't you wouldn't you wouldn't you just to to compute the cosine you would do something more complex so if you want to install number since it's open source you
can combine it from scratch if you want but you have to come by LVM and a specific version of it because of the intense has backwards and compatible changes in each of the charities serve the conversion of number of occurrences infinite sex and you would have to fetch the infinite 6 from by it for about 4 or get if you can get them some binary development packages and then you have to combine the Enlightenment sufficiently recent C + + compiler which is not not trivial at also so where we really come and use condoms which is which is a continuum zone package manager so it's persons that an open-source package manager and it has a image comes to weave weaver ready for distribution of binary packages called anaconda and if you have a I just type and condensed on number 1 you have it about so and it's wrap up so you can find documentation on the web and we have of course did have account with call issue tracker you ever welcome to come to a number users may be used either as a user was always upon potential contributors that I must also mention that number I scholarship supported by continuing matrix so if you want to buy consulting enhancements support for some architectures you can right to say such continues I O and there's a lasting number probe which is an extension appropriate or extension to number which provides bindings some specializes libraries for the for the GPU and various scientific specialized libraries and it's also has I think it has extensions to relate to prioritize it go easier on the so that's it if if be set to
questions about theory is that
all the 1st but it sounded like you supported only a subset of all the platforms that all the and supports why is it that you don't just have the same support requirements in the platform list is all the where you
say that we support a subset of what you do you support everything that L the unsub supports audio-only support domain as architectures yes it's a matter of validation because idea each works with who knows what you actually you know effect and I was also wondering a couple years ago attempts to marry C Python and all the and that together called unladen swallow and I was wondering if like nothing ultimately came in an online small died because on a if the work that they had done was helpful at all in the development of number I don't think so were not directly you have an advantage over time they said that the hot halves of Yemen prove the support for JIT compiler parts indirectly benefited but we did we didn't take anything from them because we use that we use our own wrap around the angle and the Ammonites and then number is you buy from have a big difference of on the unladen swallow is that a maiden swallowed everything is a prosperous which I think was a very young I mean it's necessary if you want to come by very fast but it's also a much less flexible so bison ourselves to experiment and uh developed quickly the have 1st 3 questions knows person this number compiled that's
compiling in a separate thread now consume the same for CSF to wait for the completion to finish before it gets yeah well what
would you what would you do anyway I mean when you because it's laser combining so it's comparing when you putting the function way you must wait for it the of course but I I mean that will be doing some testing centric composed it separate threat and it just continues with the slow version and it's done right and we don't know working as single when do you have any support for storing the compiled code from for storing the compiled just now as I said we we want to support caching but not yet so that's what you meant with cash say I think that like I pi which has the problem that can store the compiled version and not by partners actually market so that after we do every time you run the code so it would be more efficient if you just do it once and stored in a sub-culture but that's not just wait another and another thing to add but like I wouldn't have OK and thought what is honeydew error handling because he said you don't have any way to catch exceptions yeah so we have a way to raise some sort of if you raise an exception for number code when you just catching when it goes outside of anatomical so here you can at you can communicate errors to the use of but you can't handle it in the number of In the secretion so
use simple to work and to focus of lot of them pay you pencil to support said Patterson plans that's all I not yet we must be support number right now so every kind of
pure Python code which relies on them by maybe perhaps
accelerated if it if it intersects with the subset of things with support but we don't have direct support for anything other than number now I suppose really someday we want to support standards no that we have no more time for questions thank you
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation


  322 ms - page object


AV-Portal 3.8.0 (dec2fe8b0ce2e718d55d6f23ab68f0b2424a1f3f)