Merken

Scientist meets web dev: how Python became the language of data

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
yeah on so our keynote speaker
for assigned no scientific my my or on the predator track today have I think the most of you know already and he is of 1 of his like the quorum more like and that all of them remained on contributors to the scientific stack on Angela yes please will come Gail within a few for good the screen is working Mike is working slides the working group OK some thank you everybody for
coming here they uh to the organisers and Alex the introduction so I think we all agree that your Python pretty cool right that's the idea right
so size the on event was really cool history so I hope you will get coffee this morning I did again so what I'd like to do in this talk is to address that but the very diverse community that we have here and so what what this talk tries to be is a
reflection on what we have in common which is Python so I'll be talking about things you don't understand which is my signs and things that I don't understand which is web development so I don't know how I get into these horrible situations anyhow I
did at some point a PhD in quantum physics so I think I'm qualified as
a scientist but these
days I do computer science foreign yourself so what we try to do is that we try to link to the annual activity so far and of the neurons basically 2 thoughts
and conditions like what you would do when you drive a car the way we do this is we use brain imaging and specifically we we which
this as a machine learning problem this is what I do and we develop Icahn's offered to do this of course so
if you want to try this you can actually do prediction of things like visual stimuli 5 days are on the recordings of brain activity using this open source software and open data
you can go online it's there but I will be talking about this today so
on the way we created a machine learning library which is known as cycle if I say we make here is that with many people was of course not only in
your mind as there was a huge success we suddenly became
cool because this science my there's there's a fairly cool thing is that so these days Python is the go-to language for sides so like to think a bit about how did that happen because we did build so I could learn another builds and there's no other tools but these were built on a solid foundation in Python
is really giving is that nations so to set up the picture
scientists do we have a reputation of being a bit different in the Python community that this historically you may say that they come from Jupiter but then
web developers are very different and I have a dream most scientists do not know what are their boxes I I I saw this kind of discussions with the 2 of them out what is that OK so a
different for instance when developers worry about strings what we worry about numbers in areas of course what developers care about database well we think in terms of arrays of of numbers of course this is so you might think of object-oriented programming but no there is a good enough flood control and we get
to do with the rate right right so there's a bit of a culture again right alright so let's let's do something
together how about we sort the your Python website I mean there too many abstracts 205 I can't read them all and the you know the hugely varied go from OpenStack to making 10 million dollars with a story that so that's when choosing this slide and so the way will
do this is that will do a bit of web scraping to get the data from the website I could've got asked that the conference organizers but that was more right
and then there will be a bit of text analysis and then will be data science and will give you a topic so my thing about this
example is it walks us through a good part of the whole uh Python's stack that's why I like it sorghum using things like you're led word beautiful soon but also that Cycorp learned and not label WordCloud forgotten so the
1st thing that we're going to do is that we're gonna crawl the web sites and so our goal here is to get a schedule the follows from the schedule i mean to retrieve the list of titles in Europe and then mortgages crowded pages and
retrieve abstracts and we have been doing this using beautiful if you've never use that's that's analysis library that allows you to basically do some matching on the document object model tree of an HTML so it's really awesome scientists would never have
developed then agreeing to vectorize the text the idea is that if you get a text it's a bunch of words
right or characters so for each document were in count how many times a word appear as an organ of but this in the table so recall this the frequency frequency for each channel so here we have a term frequency vector that's describing my mind my doctor and you can see that the most common word is aid and then the Python is a very common
so maybe that's not a very good description because some of these terms are all over the documents so what we can do is that we can the ratio between the terms all over the
documents that the frequency of the terms over the whole database and the frequency of the term in the document so we call this the uh uh TF-IDF the term frequency inverse document frequency and you can do this with sighted wearing using what what all chip idea vectorized OK so now I feel a bit more in my conference of
grown from text which I don't understand to vectors of numbers feels better so long we refer to all the documents then we have a matrix
right but to the array that gives us the terms in the documents was the term-document matrix this can be represented as a sparse matrix because most of the terms are present in very few documents right so we can use the site by stack to to use sparse mattresses and the good news is that the scientific community not even the scientific Python community
has developed lots of false operations for a sports interest so wording text mining where things that have been developed by people who do partial monitoring July equations are things like this then we can't extract topics so
what we're going to do here is that we're matrix factorization really take this as term-document matrix and we're going to factorizes into 2 matrices 1 that gives the loadings of documents on various terms and the other that gives the loadings of no sorry wordings of documents on what we're going
to call topics and then loadings and topics on terms right so here's the 1st major tells me
what documents on a different topic and the 2nd matrix tells me what terms on a different topic so this is a matrix
factorization so once again and back and things I know is a computer scientist often we do this with nonnegative constraints in uh and text mining because the fact that the
term is negatively loaded and topic might or might not mean something we can do this in so I could learn site . decomposition not for nonnegative matrix factorization that's where the magic happens so we run this end we get word so
that's a representation of the 1st topic and what is it about it's about the Python language good news the 2nd
topic is about while science and machinery and
then the 3rd topic is something like this thing and
then we can look at all the topics and there's a bunch of different things you may have a synchronous you got a topic about the community what about basically conference organizations Internet of Things best practice and 1 and are not shown here which is thoughts in Spanish order but so as Python is not only a numerical language we can also output website from this using a template engine and if you were make life think you can get a recently used usable
websites so that it's on the web you can have a look at it and there's a link to the code that actually generates all this so you can run if you're interested
they want to try and OK but source psychic learn and the complains that by installed or bibenzyl bank it was a C compiler now you're starting to get angry at me right those back to the fact that were different historically we've had a lot of problems with people don't have fortune compilers why don't you go environments well you often
fortune has given us really really false libraries the meeting and leaves the implementation of Major that's
operations and a fortune optimized when you can get the factor of 70 of difference effective 70 something right so packaging has been historically a major roadblock for scientific Python and the reason is we we
rely on a lot of compiled code and shared library so we've been hitting problems like the fact that libraries were not there or ABI compatibility issues but the good news is that there is a huge amount of progress for 2 reasons the first one hour wheels and specifically recently many Linux wheels so the idea being that you rely only on a conservative course set of libraries so that basically is solving so that the problem I showed shouldn't happen it should should work you can try to tell me the and the other the other reason is
that this this thing it's called open mass which is linear algebra not using fortran so that's good news by way fortune is the very modern language that is super performance because it allows you to automatic vectorization which C cannot do because it's got different semantics so don't think that fortunate something from the seventies qualities yes
but it's different but if a white together we can get rainy or something so for instance I hope that you can get this example to get text mining and any of your website it should be easy to do
right when so it's magic but you can use it all right so now let me let me help you think a
bit more like like assigned to
and in how we code they you know what it's mostly about so we
really love and employ the unemployed right it's the numerical Python covariance matrix or operations arrays operation so the reason reading of and is because of spots so this trying for instance to compute the product of term frequencies versus inverse document frequencies on 100 thousand times right so we can do this with miscomprehension and takes 6 seconds 6 ms may not sound a lot but when I do say nonnegative matrix factorization algorithm I do these things in many many many times and actually a 100 thousand terms is not think it's needed so that is actually points now if we do
this with them by so the got a slightly different in we get 70 microseconds so that's almost a factor of thousand speed up another
thing that we really like is that if you used to it's in that it's actually very much more readable array computing requires learning it but once you've learned it is extremely readable what compare the S T A times IDF to compute the at times I get to the list comprehension so it's
important to realize that rate where actually to us nothing but pointers what is what defines an entire race is a memory address but data types a shape must strike so the shaven strike or things that tell you how you can move through array and basically you moving through the air raid by pointed matrix OK it is moving from 1 1 point to another by computing offsets so when represents is regular that in a structured way
so this is really important because it matches the memory model up just about every numerical library whether it's in C C + + for training were actually believe other languages most languages
fertilizers copulas interactions across this combined language water so for me the value of the bike is ringing that has a memory model so let's look a bit but why it's foster
such a community of idea 1 thing is that you're not getting in touch checking during the operation the 1st you're getting all that that the dynamic types during the competition to due to know what T times idea will do but then it's combined code that runs that the operation but then maybe most importantly you're using direct regular sequential memory access OK so just grabbing your data there's no pointer dereferencing for there's 1 but after you're done you just grabbing chunks of data from from the ran or from the cash and
that's really fast and so then your CPU or your after kernel library can implement things like vector operations using presence in the the operations so that's what we
really makes them by bypass the time checking is part of it but it's not only right that's much faster than this that's cool let's look
at that was directives begin and then suddenly we get a factor of 2 call in compute time for element so you have an idea what this may be due to it's gap so 10 to the 5 elements of approximately the size of the CPU cache you could do the computational you these are probably flowed 64 so here 8 bytes right so the problem is that memory is much lower than the CPU so your goal when you walk past calculations to get things in the CPU as fast as possible and here you're
starting to get out of that so that's bad news for for a computer but there is even worse
if we do a slightly more complex operations the 2 times minus 1 then the cost that you starts increasing so what's going on here well if we look at what's happening I is computing times idea in creating array that we don't see and I'm going to call it temporary arrays and then it's removing 1 from this temporary so what we're doing here is that we're really moving things in and out the cash hugely so we had pretty bad rationalization here so then this is because of the by contributing model it's just the
way that works so we can find this and we see that there is a
huge cost to removing this 1 into the competition but if we play a trick is that would you know role there is and the things slightly better by using an in-place operation for the 2nd so the idea is that we're reusing the allocation of the temporary arrays were not allocating race twice if we did is it gets much much faster and the reason is we've become much better with caffeine reality less so if we look at our
graph we can do and invited place so it's still going out with with the number of elements
but because and operation it's cluster so what we have
here is really a compilation problem might wanna go from this expression to this expression
uh so we want to do things like removing or reducing temporaries or we might want to achieve chunk operations right so if I can do for Willemstad to loops on the data size of the right size then it would be and suppresses non max which is something that's mostly developed by French is felt can't do this using string expressions so that's that's an example number expert
evaluate Chipcom's I give minus 1 a
without being clever mimics was clever for us you get the speed OK so you get the same speed up as the land line right so figure out of them so
that is basically uh a just-in-time compiler will a compiler that does these kinds of things uh with by
putting an inspection another approaches a nice package it's called lazy rate that basically Bill an expression but doesn't evaluated and then evaluate its when you call again the basic it's going around the uh Python evaluation and I like to
point out that this is actually not a problem that is specific to scientific computing it's a similar problem to do things like grouping invaginating sequel
theories from talking about things like don't know him right so just
to to summarize the kind of things you could give to your reduce it your your CEO but it's too small you get over here overhead of Python overhead of operation range if it's too big you fall out of cat of optimal lies in the middle we probably want to be lying here because that's where Big Data is that's where the
magic is the money some people to take a picture so I this part right look at what if we
need for control for instance we
don't want to divide by IDF on 0 so I told you we don't use full control for what we're going to do is that we're going to do an expression of this expression is basically saying that where the idea is the role that returns the and array of billions of but then I will put chip 2 0 OK so that we we don't talk control so um suppose we're looking at ages in the population and I want to compute the mean age of males versus females so then I can select the age array with gender writing and say well for gender is equal to
male uh I'll compute the mean well substract were gender is equal to gender a type of now this is really starting to
look like it it is right were really trying to starting to selections so um what kind of government by parallel to them part there's a library called dependence that is really something in between arrays in in the numerator so it's it's been huge hugely by dividing the community because it's fantastic for these queries in this data messages of foreign numerical algorithms that's maybe less than Catholics because anyhow we're the falling
back to them OK so what what does
it tell me as you're not believing Python right you're doing a better beautiful Python code that sits on top of lots of I believe for tree C + + readings and that gives you scalability but it's insulation problems but then I realized that most web development is actually some beautiful Python code that's sitting on services like a database that could be in C + + and Java in their land in God knows what In node yes and that actually gives as deployment problems the direct compilation problems you deployment problems there
were not that different right which is struggling with similar things instantiated in different matter so know these days I
like to think is the bias the scientists equivalent to don't use sort of what I'm talking so numerics as we've seen a
really efficient the kids we apply them to regular use this data but now apply the words creates cache misses for bigger rings so we need to fight to remove temporaries in may be tempted but if we do
queries and then they're going to be really efficient but if we can use indexes trees so typically databases do that but we're going to need that to a group group covariance so all these the compilation problems but
combinations is and like so we can do for instance we can think of computation and query language that's a bit what non-expert does but I
really hate domain-specific languages and each time I try to use equal because I'm not aware that after I get it wrong in I get annoyed and the other problem is that no by that she extremely expressive things that you can do with them by or with related tools is extremely varied so I don't think that's a
good way to go and any help i'd like like and I want to be doing by so
1 approach is to hack and they're really cool example is putting your is when you're in that's what development you should do better than me that so what pony urine does uh is it will uh compiled python generators to optimize sequel query you can write something that looks like a book by the generator book it's going to Dubai could inspection well based inspection I believe that a mn then grab grab BEST and billed as a sequel query on top of this and optimize its uh by a compilation groupings so so that that's really grows longer really surprising but it's really cool so I'd like to use
draw your attention to something that's happening a lot in the big data
big Data world which is something that's known as spot in its it's a rising star in its Indians and basically on top of the G the amount of the Jenna world and it combines 2 things that combines a distributed stored so people don't realize is usually but it combines a distributed store which is some form of they'd have
babies like stored and a computing model and put them together in it allows it to do distributed computing in a reasonably efficient way now the thing is that we supplied in the world but actually much faster when the data fits in RAM and the reason is that uh we're really representing data as rigorous space race and so then we're going to string the fall where's
the Jabba world that has a lot of references so if you want to
scale up maybe we're going to have to do operations on chunks right maybe we need to the date that and then maybe in parallel or in series is a matter compute things all arrays that fit in RAM often
cash now this is great for certain computing patterns
things that instance known as extract-transform-load but if you're doing multiword statistics which machine learning is about that you really combining information from all over the you're reading and you're
reading learning to that but the interaction between machine the term machine and learning those 2 together make a topic at so the kind of compute graph that you get are horrible and it means that things like out of
course operations which is basically what we're doing when we're chunking data are not efficient there is no data quality uh so 1 approach is to do algorithm development which is what I do so I'm happy and the idea being that you use of online algorithms so it's basically you don't use the same algorithm using the algorithm that works on a string and then you start changing the and the algorithms so if you've heard of
deep learning yeah then the number 1 algorithm that using in deep learning is stochastic gradient descent and that's how it works that's how people can apply the burning which is extremely computationally expensive to huge datasets so back to data
science uh so
we have shown you how we can go from the matrix of term document to a factorisation then there's magic right so there is an algorithm I did not discuss how it works which is imported from
what what the socket that's do is that they take hobbled papers full of Metabric expressions and drinking a lot of coffee they turn it into this cover really hard by the way uh
people have been asking me yes they so why do we still use code that's written 40 years ago or 20 years ago unfortunate because writing stable numerical code is extremely hard in
no better code is being written so far so the reason that we use like it when and by the time have been able to do this is
thanks to the high-level syntax of Python and everything I've presented here so the reason all this is important is because it reduces are cognitive load and allows us to do all right let's talk a
bit about something else than the mere and let's talk about the future and about what's going to make like a great again so I think that we've
been seen recently that data flow in competition law crucial so you can have know the simple data parallel problems you can have the messy compute graphs so you can have you know online algorithms and so data flows engines are actually
popping up everywhere so for instance maybe you've heard of DOS so dust is a pure Python steps search graph compilers so it will represent a set of holes of function calls on the Duck as a graph and components uh and then use a dynamic scheduler on this to do Palin distributed computer but so it's
ringing noise except it's basically static which means I add things to my graph unknown there uh tool that people use in deep learning is the animal in people properly don't realize that it has expression analysis inter and builds a graph of operations optimizes the that's the is a scene possible library the by Google
to dig deeper learning they may also build a graph of operations so aggressive operations or they're in many many different libraries below them I
believe that Python should really sure here they can is reflected can be some form of me to programming and because of the recent Eysenck developments because I think the future is is propellant distributed computing so as
Nethanel Smith who is in by developer said Python is the best America language of because it's not a numerical language and I believe this is
extremely true that we have a bit of a problem here is that the API is really challenging because is wording algorithm design and we can't really do what would you guys have been doing something like Django where there is basically an inversion of control other and and you're no longer writing imperative code as you would
do you're buying into framework and I still believe we can write really complex algorithms like this is just too much cognitive overload
but it's just an API designed well will sold so in terms of
ingredients for our future data flows I think distributed computation and runtime analysis are important things end for this I think Prof accepted the central it's really useful for debate by the wave of upon not Python the number 1 thing and this is is the ability to debug like in the bag in a in a high-level way which means I can be bad things like numerical instability in my algorithm that's really hard to do you you got something that blows
up somewhere in terms of numerical precision of Python is fantastic to the I can do
interactive work which is how much data scientists work this will enable us to this already animals us and will enable us more code analysis which is going to be really important for being efficient then it gives his 1st systems which is extremely important for appellate computing because when
you're doing well in distributed computing need to move data we need to move objects around between different computers and you need to move code for this you need to the so
I realize that so we've been relying on on on pickle distributed computing has been relying hugely on on pickled uh and the idea is that it uses it to distribute the couldn't be done between the different um workers but we can also use it to serialize intermediate results OK so that's 1 way of doing computation on data where all the intermediate results might not fit in in it can be made
very easily with Python and another thing that that we do is that we actually use they call to get a deep have left in the sense of a cryptographic hash of any data structure so it's really nice because it allows you to see if things of change or not so do about recomputation but the problem is that people is actually very limited the weights implemented in the core and the core library
pressures there's no uh support for land is and these things are not fundamental limitations the tradeoffs basically and so the variance of the cold light deal because and I must say that I really like 1 of those 2 or maybe ideas from 1 of those 2 to go in the standard library because it's actually limiting hugely computing not
to be able to because everything so I realize we're never going to be able to people absolutely of and I was to realize that I can write code that always because that's what I do but when I give this to not very advanced user he will at some point
right because the empirical so for me by the way this is more important than the guilt that may be surprising but when you you get to know a distributed computing well these things the a problem data exchange basically that we have is
the small library that we call job that that allows us to do ingredients for data for computing and 1 thing it does is a very simple parallel computing syntax which is basically of a syntactic sugar for problem for loops and behind the hood users threading erm multiprocessing or just about any back you can plug in you can plug in your in back in there uh this false persistence so it's basically a subclass of the goal but this clever things for by raising and gives primitives of core competition the reason I'm pointing this out is it that you very non-invasive syntax and paradox uh so with this with a library like job but we can write algorithms and it's actually used uh in inside psychic even though you may you may not know it well it fossils being
designed to be forced on them by race in it's getting more and more of an extendable back in system so I'm looking forward to a world where we can use things like celery uh 2 uh basically distribute computation from psychic Learning in more of uh web developments and 1 I don't know if it's a good or a bad idea but I'd
like to try so I think the point
in in great it's us and 1 of the reason it's great it's because it's simple which is what a lot of people have been criticizing for for instance the Jabba world tells us that they have software transactional memory and it's really cool it would be nice for Python but I personally I really need to use for a number I needed interestingly Java has gained recently and J. Malik to allocate basically for in memory we'd like better garbage collection we really would like but just about every C extension relies on reference count and the reason is it's actually very easy to
manipulate the reference counting if you're not sitting in the BN right to basically the Python is something that I can manipulate without being inside it which means that it's really great to connect to combined language and I'm talking to people in the conference many people actually use this many people use libraries that have been developed in another language trooper another to to draw the
attention to a site who knows
cited good who uses site good it really gives us the best of C and Python you can add types for speed and they don't things so raw that when you add when you type in and by raise it basically becomes that float stars so of thought to write in same so super fast but you can also use it to buy external libraries and it's surprisingly easy the
good thing is suddenly you're working with the libraries there you working with C like code without any knowledge free pointed at which is for me the number 1 problem of these languages so I see this as an annotation
layer between the by the vehemence and its really fantastic tool by the way I think everybody should be writing extensions using site and they can as it's an abstraction over the C Python library the C Python uh API so for instance
you can write code that's very readable and that complies with Python 3 and by to even that there's been a lot of changes in the sky Python API there's also a good idea is also good for them and by developers because they'd like to change things in this the Python API and if everybody writes site and they will be able to because site and we'll do the impedance vector OK so we need
scientists can work with web developers and we really educated love each other I believe Gelimer a really serious here and you really enjoyed people
who not doing science in the Python community there 1st they teach me think thanks that section they
make that's the tools that like years and so i'd like are tools to be
useful for us and I'd like to point out that so I learn it is actually a really easy machine learning it's really a very simple syntax basically you important object and its the magic of it that will do classification to recognition of things you can still say that and then you give it they don't so it's
basically matresses right we only do interest and so you have to figure out how you convert your on data to matrices M. then you call for it and then you go predict but 2 people 1 of the
successes of cycling is is this encapsulation people have really love the fact that the classifier is sum black box so they can use it without fully understanding the uh so
that's another thing that Python is giving us is uh object-oriented in a really really cool model that allows us to do object-oriented programming without us a crazy uh crazy class diagrams uh and another thing that we've used hugely is about what people
call Dr. documentation driven development so there was a talk about this a so to try to make this API simplest possible what I'm trying to get at here is that we're trying to give you a higher level simple API to reduce year
cognitive load just like Python and then by produces are cognitive load when we're implementing these algorithms so where all due
have to their different things here and we can all benefit from each other what we can do this only for a really careful to reduce each other's cognitive load on what the other does not understand I think it's extremely important so it's important to
be didactic outside of one's own community and actually Python is really good at this the jangled uh a documentation is known as being really excellent but Python worries about syntax being beautiful uh and so To do this we need to be things like avoiding jargon so machine learning is really that it's full of jargon we in cycling try not to have too much we need to prior information and so
for instance students that are applied math students and learn about merits I had to tell you they don't care about you even the French ones that have much on the
1st thing 1 recommendation I have for people that that that that dude API design is build a
documentation upon very simple examples and examples that run so 1 thing that we do is that we this thing of course means gallery that basically users suspects 6 is also to build our documentation running all the examples that means that the examples must run they must run foster means they must be small enough to run and so I think this has helped a lot with the documentation but also the judges like all right to
I think it's pretty it's because of the interaction between people like scientists and people who were not scientist whether they're web developers or deadlocks for anything
have I been censored other people
um what was I saying well anyhow the Python language in its being is the perfect tool him to many fully low-level concepts whether you know the eraser that you can manipulate things like like trees in scenes with high-level word in and I personally think it's a personal opinion but this has been achieved through the recent success of Python by missing during hugely and when you look at how
people are using it at some point but they're pointing to something low level very often dynamism In reflexivity are crucial because it enables me to programming and debugging but we also find that we need for
compilation speed so then there's this this tension between dynamism and compilation and I have the feeling of every word it's also in web development where the say combining sequel query uh and I'm extremely excited about the pets that victory in is pushing forward like the gods on internal
structures to allow checking at runtime for modifications so that will allow us any kind of acts that we do on the code to be uh invalidated if the environment changes uh or the that for functional specialization finally I
think that pied-a-terre has gained and will gain hugely from our database will the and the controversies that are developed a lot in the world and DevOps will book I think it can also give back other things like Knowledge Engineering in AI which are really know growing hugely and just in case you haven't noticed a the science is disrupting just about every job that that you're doing so it's called that there is the science in Python right that's all i have thank you if you if that very much data on
the outbreak you know pretty insights and
different little different world soul questions raise your hands the like the might to wide off of things registered you know that 1 thing at a specific question is a statement that centered world was a very adaptive Python straight I think the they're just several years ago the most of the sentences that wasn't prices 3 which is a very thing entity can use pretty much any is good scientific pectin presence rate of something that in theory and in
that the biggest cost of Python 3 1st with the change of the uh C Python API and so actually people still in niche applications have code that doesn't run and by then 3 because of the city by the with all the main libraries by boss margin random 3 and everything I do random 3 and 2
questions OK probably get that out of the they would ask the ways who about paper a have a trolling of it and my thought yeah I know
a lot about so to give a little background like my brother studied uh language theory so we get crazy discussions all the time uh so yeah I know a lot about these things that uh part of the things I wanted to talk in my thought was the fact that it's not only about protecting that applies not only about it's about the memory model I think by the way by by has progressed hugely in this sense which is it is no longer trying to say I'm going to control the memory for everything uh which historically was a big robot for us I mean we I could not believe that type I would be useful for scientific computing because for a long time I heard that the angle of pi pi with things like a software transactional memory which is really cool by the way both will cost us things a lot in novel and the other thing is we're not going to to get rid of the compiled code because there is so much history making those algorithms really good and it's extremely hard but I do believe that what would abide by world this is doing which is a lot of analysis of the code is extremely extremely useful that it actually thank you we but
not any more questions already in in the back on his sorry that is constant and so but they're going to keep you keep referring to how world your Python world is the division that's clear
love for me knowledge for me told us by they got personal friends in all the communities uh I used all kind of different tools but I'm afraid there is division uh in I'd like to think that it's fueled by the by the different tradeoffs uh and like deflected by the way I don't want it I don't think it's useful but when when you hear a new things like come down which is sort of a package for Python and other things and the reason it was created was basically
peak is and the way I think it is the reason it was created was because that that the scientific crowd was unable to explain the struggles that would have been where the and packaging
tools in Python and just went on and did their own staff well the good thing is that some people were so people at you can match and then worked and now that I believe should be able to work fine but that's 1 example of the division and I think it exists and I think we need to fight the because ah value so there really believe in our values the fact that we're diverse we were able to work together yeah great question in this is
those based on the use of the scenario In 5 7 years thank possible variants to other languages like that or the whole from more things new things 2 or more find moments so you talking in
in the scientific Python in the scientific world yes are and be extremely of community I think or will die so be called for the give you background when we started so I can learn what was 7 years ago everybody would walk up to those in same user crazy everybody does all work a machine learning everybody does matter but it the 7 years down the line and nobody's mentioned this so why do we are is also know as a language is a horrible language but in terms of libraries and I told you know the numerical algorithms are really hot will art has a crazy amount of and for me is that the station or is the reference but what's the value of data analysis is not only numerics it's in combining things and I think we have an edge here so the Matlab yeah I think were eating slowly and they're fighting back I'm getting e-mails on a monthly basis yeah training to condemn network to see how work or would like uh but but the fact that we're going out words they're pouring money to fight this is telling me something maybe it's gonna take a bit of time in the scientific world but I mean in in a good the strong container would be Julius is typed language that is able to do and that's the clock interference and combined extremely fast of connected uses and be in I really don't like it I mean it's a fantastic language is also the best language like I really don't like it because it's a numerical language and they don't think of it that way but it's that the whole community is numerical community and no more that is going to be itself and of course a you thanks to the Vanessa talk
fantastic library psychic learning is only 1 of the libraries in the cited family there is also a psych images I could be your what is your relationship with cited family so that's very historical we used to
have that's like 20 0 8 there used to be but site did with the nest that means these packages you guys revenues this packages through 1 of my nightmares uh inside by and that's how we all sorts of and then to it of sci-fi because cyber was going to make uh a and then we got rid of the sort that means this package it used to be called like it's not learned and 2 intersecting action uh and it means scientific uh it's very historical but was the relations of ideas for friends with friends OK on the last question
on value so that but 1 that's sort of question and to point out that 1 specific thing about from the dead is beyond Python beyond this is where it comes struggle people come to struggle with known specific stuff so if you wanna database or a specific uh you wanna solace spectral nodes area as an part of the candidate can actually do that so it actually sits on top of 5 and not in is more like that together then then the of and so in this case I'm not I'm not really sure why should have something the center library that actually does that what was also I completely
agree so so the comment is come there's more than 5 and basically uh know it is by the way but historically it's not been marketed like this I mean I've heard to image but don't use that use common which is linear this in mind that by the way like like the uh and the other thing is I haven't seen much work go from come now to them not even talking about contributing back to bed but I'm talking about explaining what was being heard right it's extremely important I would really like I would like to call the for each from them is statement but I would like on forged forward 4 point to either died or to push a phonetically to pushing it it but I would be also but we mean 1 place where we can tell everybody go and get your stuff and we need this place to be good and we need to work together in a sense call has achieved this because I'm only has created as it's created in maybe an inside release it showing that you can do things better uh but you need to go all the way back and get get new back in the wider Python ecosystem of improvements because it's all going to benefit OK so long we
have 1 more thing to announce so please don't run away after you've given them fantastic enthusiastic applause forgive you think
Rechenschieber
Weg <Topologie>
Besprechung/Interview
Gruppenkeim
Vorlesung/Konferenz
Baum <Mathematik>
Touchscreen
W3C-Standard
Spiegelung <Mathematik>
Vorzeichen <Mathematik>
Rechter Winkel
Selbst organisierendes System
Web-Designer
Vorlesung/Konferenz
Ext-Funktor
Ereignishorizont
Computeranimation
Punkt
Open Source
Besprechung/Interview
Quantenmechanik
Computeranimation
Quantisierung <Physik>
Software
Datensatz
Prognoseverfahren
Datenverarbeitungssystem
Verschlingung
Software
Offene Menge
Konditionszahl
Visualisierung
Vorlesung/Konferenz
Algorithmische Lerntheorie
Informatik
Bildgebendes Verfahren
Stereometrie
Virtuelle Maschine
Bit
Formale Sprache
Dreiecksfreier Graph
Programmbibliothek
Stereometrie
Vorlesung/Konferenz
Computeranimation
Subtraktion
Quader
Datenhaltung
Güte der Anpassung
Nummerung
Term
Computeranimation
Zeichenkette
W3C-Standard
Array <Informatik>
Bit
Flächeninhalt
Objektorientierte Programmiersprache
Web-Designer
Vorlesung/Konferenz
Datenfluss
Softwareentwickler
Array <Informatik>
Zeichenkette
Instantiierung
Metropolitan area network
Offene Menge
Benutzerbeteiligung
Web Site
Bit
Selbst organisierendes System
Abstraktionsebene
Vorlesung/Konferenz
Bitrate
Computeranimation
Keller <Informatik>
Offene Menge
Bit
Web Site
Abstraktionsebene
Objektmodell
Mailing-Liste
Abstraktionsebene
Matching
Computeranimation
Keller <Informatik>
Netzwerktopologie
Metropolitan area network
Scheduling
Mailing-Liste
Mereologie
Programmbibliothek
Vorlesung/Konferenz
Wort <Informatik>
Spider <Programm>
URL
Analysis
Vektorraum
Abstraktionsebene
Zählen
Frequenz
Term
Computeranimation
Portscanner
Metropolitan area network
Deskriptive Statistik
Mailing-Liste
ATM
Vorlesung/Konferenz
Wort <Informatik>
Spider <Programm>
Tabelle <Informatik>
URL
Matrizenrechnung
Bit
Datenhaltung
Inverse
Matrizenrechnung
Nummerung
Vektorraum
Schwach besetzte Matrix
Abstraktionsebene
Frequenz
Term
Computeranimation
Portscanner
Metropolitan area network
Mailing-Liste
Vorlesung/Konferenz
Spider <Programm>
URL
Modul
Nichtlinearer Operator
Matrizenrechnung
Matrizenring
Matrizenrechnung
Gleichungssystem
Partielle Differentiation
Term
Teilbarkeit
Computeranimation
Text Mining
Metropolitan area network
Last
Vorlesung/Konferenz
Wort <Informatik>
Term
Matrizenrechnung
Nebenbedingung
Web Site
Transinformation
Formale Sprache
Güte der Anpassung
Selbstrepräsentation
Matrizenrechnung
Interpretierer
Extrempunkt
Term
Teilbarkeit
Computeranimation
Text Mining
Helmholtz-Zerlegung
Metropolitan area network
Vorzeichen <Mathematik>
Vorlesung/Konferenz
Wort <Informatik>
Informatik
Term
Videospiel
Subtraktion
Web Site
Selbst organisierendes System
Weg <Topologie>
Formale Sprache
Datenmodell
Regulärer Graph
Web Site
Internet der Dinge
Codierung
Computeranimation
Metropolitan area network
Softwaretest
Funktion <Mathematik>
ATM
Notebook-Computer
Ordnung <Mathematik>
Funktion <Mathematik>
Lineare Abbildung
Lokales Netz
Web Site
Compiler
Algebraisches Modell
Implementierung
Quellcode
Binder <Informatik>
Code
Computeranimation
Benutzerbeteiligung
Verbandstheorie
Programmbibliothek
Vorlesung/Konferenz
Programmierumgebung
Nichtlinearer Operator
Subtraktion
Pauli-Prinzip
Algebraisches Modell
Smith-Diagramm
Codierung
Teilbarkeit
Code
Computeranimation
Menge
Arithmetische Folge
Programmbibliothek
Speicherabzug
Vorlesung/Konferenz
Energieerhaltung
Compiler
Schnittstelle
Web Site
Formale Sprache
Güte der Anpassung
Ruhmasse
Text Mining
Vektorraum
Computeranimation
Formale Semantik
Text Mining
Lineare Geometrie
Vorlesung/Konferenz
Chi-Quadrat-Verteilung
Instantiierung
Matrizenrechnung
Punkt
Kovarianzmatrix
Inverse
Kardinalzahl
Term
Computeranimation
Metropolitan area network
Algorithmus
Vorzeichen <Mathematik>
Vorlesung/Konferenz
Array <Informatik>
Nichtlinearer Operator
Zwei
Inverse
Element <Gruppentheorie>
Biprodukt
Frequenz
Teilbarkeit
Codierung
Zeichenkette
Loop
Rechter Winkel
Datenfluss
Term
Instantiierung
Lesen <Datenverarbeitung>
Matrizenrechnung
Shape <Informatik>
Datentyp
Punkt
Disk-Array
Inverse
Element <Gruppentheorie>
Regulärer Graph
Mailing-Liste
Computerunterstütztes Verfahren
Zeiger <Informatik>
Bitrate
ROM <Informatik>
Teilbarkeit
Computeranimation
Portscanner
Array <Informatik>
Loop
Datentyp
Vorlesung/Konferenz
Speicheradresse
Zeiger <Informatik>
Term
Maschinenschreiben
Bit
Folge <Mathematik>
Wellenpaket
Wasserdampftafel
Formale Sprache
Interaktives Fernsehen
Regulärer Graph
Kardinalzahl
ROM <Informatik>
Code
Computeranimation
Richtung
Informationsmodellierung
Typprüfung
Regulärer Graph
Datentyp
Programmbibliothek
Vorlesung/Konferenz
Zeiger <Informatik>
Nichtlinearer Operator
Gerichtete Menge
Datentyp
Speichermodell
Element <Gruppentheorie>
Kopula <Mathematik>
Zeiger <Informatik>
Array <Informatik>
Festspeicher
Zentraleinheit
Informationssystem
Caching
Nichtlinearer Operator
Proxy Server
Gerichtete Menge
Element <Gruppentheorie>
Systemaufruf
Implementierung
Computerunterstütztes Verfahren
Vektorraum
Element <Mathematik>
Zentraleinheit
Rechnen
ROM <Informatik>
Teilbarkeit
Computeranimation
Kernel <Informatik>
Typprüfung
Festspeicher
Caching
Mereologie
Programmbibliothek
Vorlesung/Konferenz
Zentraleinheit
Informationssystem
Caching
Nichtlinearer Operator
Element <Gruppentheorie>
Regulärer Ausdruck
Computer
ROM <Informatik>
Computeranimation
Metropolitan area network
Informationsmodellierung
Rationale Zahl
Vorlesung/Konferenz
Zentraleinheit
Array <Informatik>
Betriebsmittelverwaltung
Caching
Nichtlinearer Operator
Graph
Element <Gruppentheorie>
Nummerung
Element <Mathematik>
Cluster-Analyse
Computeranimation
Metropolitan area network
Loop
Vorlesung/Konferenz
Zentraleinheit
Array <Informatik>
Caching
Expertensystem
Nichtlinearer Operator
Extrempunkt
Compiler
Element <Gruppentheorie>
Regulärer Ausdruck
Nummerung
Ausgleichsrechnung
Computeranimation
Zeichenkette
Loop
Arithmetischer Ausdruck
Vorlesung/Konferenz
Gerade
Zentraleinheit
Zeichenkette
Caching
Just-in-Time-Compiler
Compiler
Element <Gruppentheorie>
Regulärer Ausdruck
Fortsetzung <Mathematik>
Bitrate
Ausgleichsrechnung
Physikalische Theorie
Computeranimation
Zeichenkette
Wissenschaftliches Rechnen
Vorlesung/Konferenz
Zentraleinheit
Leistungsbewertung
Caching
Nichtlinearer Operator
Minimierung
Element <Gruppentheorie>
Schreiben <Datenverarbeitung>
Computeranimation
Arithmetisches Mittel
Metropolitan area network
Arithmetischer Ausdruck
Spannweite <Stochastik>
Rechter Winkel
Geschlecht <Mathematik>
Mereologie
Gamecontroller
Vorlesung/Konferenz
Overhead <Kommunikationstechnik>
Datenfluss
Overhead <Kommunikationstechnik>
Computerunterstützte Übersetzung
Zentraleinheit
Lie-Gruppe
Instantiierung
Managementinformationssystem
Abfrage
Ikosaeder
Kardinalzahl
ROM <Informatik>
Computeranimation
Arithmetisches Mittel
Array <Informatik>
Geschlecht <Mathematik>
Trennschärfe <Statistik>
Datentyp
Mereologie
Näherungsverfahren
Programmbibliothek
Vorlesung/Konferenz
Datenfluss
Parallele Schnittstelle
Message-Passing
Array <Informatik>
Erlang-Verteilung
Compiler
Datenhaltung
Applet
Ausgleichsrechnung
Codierung
Code
Computeranimation
Richtung
Netzwerktopologie
Bildschirmmaske
Web Services
Skalierbarkeit
Rechter Winkel
Grundsätze ordnungsmäßiger Datenverarbeitung
Web-Designer
Vorlesung/Konferenz
Caching
Automatische Indexierung
Kovarianzfunktion
Compiler
Datenhaltung
Gruppenkeim
Abfrage
Quick-Sort
Computeranimation
Netzwerktopologie
Metropolitan area network
Unterring
Array <Informatik>
Rechter Winkel
Automatische Indexierung
Caching
Vorlesung/Konferenz
Wort <Informatik>
Caching
Automatische Indexierung
Domain <Netzwerk>
Bit
Compiler
Minimierung
Schaltnetz
Gruppenkeim
Abfrage
Fortsetzung <Mathematik>
Computerunterstütztes Verfahren
Domänenspezifische Programmiersprache
Computeranimation
Generator <Informatik>
Array <Informatik>
Retrievalsprache
Vorlesung/Konferenz
Softwareentwickler
Instantiierung
Caching
Automatische Indexierung
Managementinformationssystem
Metropolitan area network
Bildschirmmaske
Array <Informatik>
Datenmodell
Applet
Vorlesung/Konferenz
Computerunterstütztes Verfahren
Speicher <Informatik>
Computersimulation
Raum-Zeit
Computeranimation
Nichtlinearer Operator
Zentrische Streckung
Statistik
Graph
Interaktives Fernsehen
Computerunterstütztes Verfahren
Term
Computeranimation
Virtuelle Maschine
Bildschirmmaske
Datenverarbeitungssystem
Mustersprache
Vorlesung/Konferenz
Gruppoid
Information
Algorithmische Lerntheorie
Parallele Schnittstelle
Data Mining
Informationssystem
Array <Informatik>
Lesen <Datenverarbeitung>
Instantiierung
Nichtlinearer Operator
Informationsqualität
Nummerung
Computeranimation
Metropolitan area network
Bildschirmmaske
Algorithmus
Stochastik
Gradientenverfahren
Speicherabzug
Vorlesung/Konferenz
Gruppoid
Online-Algorithmus
Softwareentwickler
Zeichenkette
Data Mining
Matrizenrechnung
Faktorisierung
Term
Code
Codierung
Computeranimation
Überlagerung <Mathematik>
Arithmetischer Ausdruck
Algorithmus
Rechter Winkel
Socket
Vorlesung/Konferenz
Term
Regulärer Ausdruck
Ungerichteter Graph
Gesetz <Physik>
Datenfluss
Analysis
Code
Codierung
Computeranimation
Uniforme Struktur
Last
Datenerfassung
Vorlesung/Konferenz
Online-Algorithmus
Datenfluss
Compiler
Parallele Schnittstelle
Nichtlinearer Operator
Lineares Funktional
Graph
Compiler
Regulärer Ausdruck
Geräusch
Systemaufruf
Ausnahmebehandlung
Computer
Analysis
Computeranimation
Demoszene <Programmierung>
Scheduling
Arithmetischer Ausdruck
Menge
Datenerfassung
Mixed Reality
Programmbibliothek
Vorlesung/Konferenz
Zusammenhängender Graph
Datenfluss
Compiler
Analysis
Instantiierung
Nichtlinearer Operator
Subtraktion
Graph
Formale Sprache
Smith-Diagramm
Computerunterstütztes Verfahren
Extrempunkt
Computeranimation
Sinusfunktion
Metropolitan area network
Bildschirmmaske
Umkehrung <Mathematik>
Programmbibliothek
Vorlesung/Konferenz
Softwareentwickler
Datenfluss
Bit
Overloading <Informatik>
Rechenzeit
Nummerung
Smith-Diagramm
Kardinalzahl
Computerunterstütztes Verfahren
Codierung
Analysis
Code
Framework <Informatik>
Computeranimation
Sinusfunktion
Metropolitan area network
Trigonometrische Funktion
Umkehrung <Mathematik>
Algorithmus
Gamecontroller
Vorlesung/Konferenz
Wort <Informatik>
Umkehrung <Mathematik>
Datenfluss
Analysis
Resultante
Subtraktion
Hash-Algorithmus
Datenmodell
Kardinalzahl
Computerunterstütztes Verfahren
Physikalisches System
Term
Code
Analysis
Codierung
Computeranimation
Objekt <Kategorie>
Metropolitan area network
Distributivgesetz
Vorlesung/Konferenz
Lambda-Kalkül
Analysis
Gewicht <Mathematik>
Hash-Algorithmus
Mathematisierung
Datenmodell
Codierung
Analysis
Computeranimation
Druckverlauf
Distributivgesetz
Hash-Algorithmus
Programmbibliothek
Inverser Limes
Vorlesung/Konferenz
Speicherabzug
Datenstruktur
Lambda-Kalkül
Varianz
Standardabweichung
Punkt
Primitive <Informatik>
Schreiben <Datenverarbeitung>
Computerunterstütztes Verfahren
Code
Analysis
Computeranimation
Loop
Physikalisches System
Algorithmus
Prozess <Informatik>
Front-End <Software>
Vererbungshierarchie
Programmbibliothek
Digital Rights Management
Speicherabzug
Vorlesung/Konferenz
Primitive <Informatik>
Parallele Schnittstelle
Transinformation
Zeiger <Informatik>
Datenaustausch
Serviceorientierte Architektur
Array <Informatik>
Paradoxon
Parallelrechner
Speicherabzug
Persönliche Identifikationsnummer
Software Development Kit
Punkt
Applet
Applet
Nummerung
Physikalisches System
Verteilte Programmierung
Zählen
ROM <Informatik>
Computeranimation
Software
Transaktionsverwaltung
Software
Festspeicher
Web-Designer
Vorlesung/Konferenz
Maßerweiterung
Speicherbereinigung
Instantiierung
Web Site
Array <Informatik>
Rechter Winkel
Typentheorie
Datentyp
Formale Sprache
Schwimmkörper
Programmbibliothek
Vorlesung/Konferenz
Zeiger <Informatik>
Systemaufruf
Computeranimation
Web Site
Abstraktionsebene
Formale Sprache
Mathematisierung
Schreiben <Datenverarbeitung>
Nummerung
Vektorraum
Zeiger <Informatik>
Systemaufruf
Code
Computeranimation
Array <Informatik>
Typentheorie
Wellenwiderstand <Strömungsmechanik>
Programmbibliothek
Vorlesung/Konferenz
Softwareentwickler
Maßerweiterung
Instantiierung
Objekt <Kategorie>
Quader
Virtuelle Maschine
Softwaretest
Leistung <Physik>
Web-Designer
Vorlesung/Konferenz
Garbentheorie
Cloud Computing
Mustererkennung
Ausgleichsrechnung
Computeranimation
Matrizenring
Gewichtete Summe
Klassendiagramm
Blackbox
Geheimnisprinzip
Ausgleichsrechnung
Computeranimation
Quader
Informationsmodellierung
Softwaretest
Objektorientierte Programmiersprache
Dreiecksfreier Graph
Leistung <Physik>
Vorlesung/Konferenz
Cloud Computing
Systemaufruf
Knoten <Statik>
Ausgleichsrechnung
Computeranimation
Übergang
Quader
Algorithmus
Softwaretest
Last
Leistung <Physik>
Vorlesung/Konferenz
Softwareentwickler
Cloud Computing
Mathematisierung
t-Test
Stichprobe
Unicode
Information
Knoten <Statik>
Computeranimation
Eins
Dreiecksfreier Graph
t-Test
Vorlesung/Konferenz
Information
Algorithmische Lerntheorie
Instantiierung
Verklemmung
Schlüsselverwaltung
Stichprobe
Unicode
Interaktives Fernsehen
Übergang
Information
Knoten <Statik>
Computeranimation
Arithmetisches Mittel
W3C-Standard
t-Test
Web-Designer
Vorlesung/Konferenz
Punkt
Diskretes System
Formale Sprache
Programm
Übergang
Knoten <Statik>
Ausgleichsrechnung
Computeranimation
Übergang
W3C-Standard
Demoszene <Programmierung>
Netzwerktopologie
Quader
Softwaretest
Leistung <Physik>
Vorlesung/Konferenz
Wort <Informatik>
Reflexiver Raum
Pauli-Prinzip
Diskretes System
Compiler
Summengleichung
Abfrage
Rechenzeit
Übergang
Fortsetzung <Mathematik>
Lokalisationstheorie
Code
Computeranimation
W3C-Standard
Funktion <Mathematik>
Web-Designer
Vorlesung/Konferenz
Wort <Informatik>
Datenstruktur
Programmierumgebung
Meta-Tag
W3C-Standard
Prozess <Informatik>
Rechter Winkel
Datenhaltung
Vorlesung/Konferenz
Übergang
Wissenstechnik
Computeranimation
Meta-Tag
W3C-Standard
Befehl <Informatik>
Subtraktion
Güte der Anpassung
Vorlesung/Konferenz
Übergang
Bitrate
Physikalische Theorie
Computeranimation
Meta-Tag
Randverteilung
Winkel
Mathematisierung
Formale Sprache
Familie <Mathematik>
Kartesische Koordinaten
Ausgleichsrechnung
Physikalische Theorie
Code
Roboter
Metropolitan area network
Informationsmodellierung
Transaktionsverwaltung
Algorithmus
Benutzerschnittstellenverwaltungssystem
Software
Festspeicher
Datentyp
Mereologie
Programmbibliothek
Vorlesung/Konferenz
Analysis
Subtraktion
Vorlesung/Konferenz
Quick-Sort
Division
Momentenproblem
Stab
Formale Sprache
Datenerfassung
Vorlesung/Konferenz
Division
Bit
Wellenpaket
Datenanalyse
Formale Sprache
Familie <Mathematik>
Term
Virtuelle Maschine
Software
Arbeitsplatzcomputer
Basisvektor
Programmbibliothek
Näherungsverfahren
Vorlesung/Konferenz
Wort <Informatik>
E-Mail
Bildgebendes Verfahren
Gerade
Knotenmenge
Web Site
Flächeninhalt
Cybersex
Datenhaltung
Mereologie
Gruppenoperation
Relativitätstheorie
Programmbibliothek
Vorlesung/Konferenz
Punktspektrum
Quick-Sort
Befehl <Informatik>
Punkt
Systemaufruf
Vorlesung/Konferenz
Bildgebendes Verfahren

Metadaten

Formale Metadaten

Titel Scientist meets web dev: how Python became the language of data
Serientitel EuroPython 2016
Teil 168
Anzahl der Teile 169
Autor Varoquaux, Gaël
Lizenz CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben
DOI 10.5446/21144
Herausgeber EuroPython
Erscheinungsjahr 2016
Sprache Englisch

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract Gaël Varoquaux - Scientist meets web dev: how Python became the language of data Data science is a hot topic and Python has emerged as an ideal language for it. Its strength for data analysis come from the cultural mix between the scientific Python community, and more conventional software usage, such as web development or system administration. I'll show how and why Python is a easy and powerful tool for data science. ----- Python started as a scripting language, but now it is the new trend everywhere and in particular for data science, the latest rage of computing. It didn't get there by chance: tools and concepts built by nerdy scientists and geek sysadmins provide foundations for what is said to be the sexiest job: data scientist. In my talk I'll give a personal perspective, historical and technical, on the progress of the scientific Python ecosystem, from numerical physics to data mining. What made Python suitable for science; How could scipy grow to challenge commercial giants such as Matlab; Why the cultural gap between scientific Python and the broader Python community turned out to be a gold mine; How scikit-learn was born, what technical decisions enabled it to grow; And last but not least, how we are addressing a wider and wider public, lowering the bar and empowering people. The talk will discuss low-level technical aspects, such as how the Python world makes it easy to move large chunks of number across code. It will touch upon current exciting developments in scikit-learn and joblib. But it will also talk about softer topics, such as project dynamics or documentation, as software's success is determined by people.

Ähnliche Filme

Loading...