Merken

Using Django, Docker, and Scikit-learn to Bootstrap Your Machine Learning Project

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
the and and home in the middle of
the room and rooted in the middle of
some brilliant OK so thank you so much for joining a expecially after lunch so I hope you all really excited CO basically look at every single technology that I try not to make my title too long but we well these titans of abuse which is fantastic so just to recap on just to recap a title this talk it's titled using Django Dr. and so learn to be chapter machine learning project something I do want to point out is that all these slides are actually available on but that the link on the bottom left-hand corner I also will be sharing on twitter and I will also be sharing the code which I will not begin any life code today but do have a repository of and would love people to use things break things tell me that I need add things so that we really fantastic but yes
there are about to start up a little bit with a story about something that happened to me recently so nothing many of us use some kind of of communication tool like slack at work that's a great so on August 1st used by this fantastic thing I'm from a co-worker of mine that you who had these things like Haiti like I've got questions the interface on this model is like what happened what's going on and light weight hold up there's a few models and there's a lot of what change would it was like the model version you're talking about I don't understand and I think that's right now speaks to kind of like the column space and currently in which is tooling around it assigns I think I this just happened to me on on his 1st and only 150 so the struggle is real I'm still in this and I'm still this problem space trying to think through this quite a lot and the love to hear in the back you have so august 1st Matthew things my OK well maybe I can go and piece together what this model it's I go jump on get hold and look at where the data scientists for tracking I'm going to 1 of the data scientist projects repository and there's like 10 year Bruno books in their with like a lot of code and a lot of things I don't understand them and I was like OK I have no idea which of these you are notebooks just the model that is on this on is on any service that nothing you're using so I hold on let me go look so go look and see if I can piece together the story from the pickled version of the model and I mean we store it as an acid rain and looking in S 3 you'll notice June 12 July 28 it OK there's something called latest which is the weaker act I hope you'll notice the size of the model is exactly the same and there's been like 6 5 or 6 weeks between the 2 models being dumped in man and I'm not exactly sure which 1 it so that is pretty much my
struggle every day i'm kind somewhere in this place where I was I'm not exactly sure which models the ones you think you know I guess this is the correct version yes this is the interface where are we with the data science and water our past practices so under talk a little bit about a few things today but before I get
started I did wanna do a quick quick introduction to who I am offer high mining for NMF and as our session rendered indicated I am totally a big start mirrored IP weight with money to the captains on the part of fantastic but on something a little bit about me is I've actually come out the world of being the focal scientist so 2 years ago I was like I conduce equal and pipeline and I like this kind of thing so that the immersion program at with camp in Chicago and I've been with the the sprout social engineering team ever since May 5th 2014 it's been really exciting in my time there I've been on the platform team working on 2 different parts of the platform and last year and some change I've been on a rainy assigns t but I will talk a little bit about the anatomy of a data science team but what I do wanna point out is the data science team is newer and we are growing quickly which is kind of where this talk comes from on some of the things they do I helped run highly the Chicago and I do you that on the paper and soccer Foundation Board of Directors I think there's a few of us here at gender constitute any questions or want to know more about the PSF this is just my call to you come speak with us for super happy to get to know who you are and learn more about your D so in
regards to today's talk we're going to be kind of going through these 5 items were image how little bit get some common language around what machine learning as we're going to talk about the anatomy of a data science team they know what Django and with plate and we can get people from a broad variety of disciplines so I wanna make sure we talk a little bit about science and how might team kind of functions on then we will high-level talk about the engineering of machine learning problem the will then move into will thinking about machine learning engineering using some of the latest and greatest close words like Dr. gender is right in the Khyber I like it learn and some other goodies and then we'll kind of leave with some open questions about what could be next for the project I'm working out on maybe what are some ideas for you all the work and data science teams what we could use an ecosystem for a data science infrastructure and also tools for more so
great What is Machine Learning
I know I don't always get a little bit of a kick out of it when people think about what machine learning is like in high I'm Johnny 5 if you don't know Johnny 5 is really ridiculous movie I movies from when I was little where you legacy proficiency like intelligence of robot can do all the things and I think sometimes some people may think when they think of machine learning expressly they don't work in an organization that has some and practice already but for the for a lot a better word I ideality always refined and common jargon just kind of boring myself and think through well when we talk about Mean Machine Learning what is it work for getting into so rather than read this fall of tax I just wanna kind of highlights of the big things here so machine learning it's of feel as of building pure science does Pattern Recognition Computational Learning Artificial Intelligence loblaw I'm sure we've heard a lot of that so I'm really care so much about that what I you really wanna focus on is machine learning explores the construction and study of algorithm that that should learn from and make predictions on the so the big thing here algorithms we all know what that is data we work with all the time and making predictions on data cool we now have some ground late with language to think about machine learning home but was that of the
let's think about it may be in a little bit more of a traditional way that some of us who maybe I'd have computer science degrees may have been exposed to machine learning in a language like this so that is 1 which be very useful comes from Tom Mitchell on computer science professor at Carnegie Mellon where's the frames machine learning with these of these 3 pieces of emphasis here is that a computer program is said to learn from experience E with respect to some task T and some performance measure P if its performance on t as measured by P improves with experience the so when we are did talking algorithms are algorithms and been doing some task they're using data and from that is from that data we're gonna be trying to drive some insights so that we can go ahead and make predictions on on what new things may be happening other depending on the features your building there's only some task some kind of thing we want to accomplish and how do we know if we're doing things cracked well we have some idea of a performance metric OK
so just because I like to kill everything with details on I think it's a is it have a little bit of a toy problem to think about when we think about you know what might be some ways in which we do machine learning in our day-to-day and what simply project we work on so sprout social where I work is the is a largely business-to-business social media management and analytics tool so something we work a lot with social media ideas of text data that's a really big thing for me so text data that's something that I think we might have all heard about we'll we'll all have things like the spam filter in our in boxes marked something if it's spam or not well let's let's maybe take that example of it further and think about a different examples so the idea here we were of something else we could think about as a machine learning problem could be predicting altruism with the naive Bayes classifier so I'm guessing some of us
I hope many of us like that especially because of the words free associated with that and that makes me really excited so what would you all do to get free pizza do not screaming out loud on a brainstorm up here so let's see a would you may be run a marathon perhaps because maybe the Scottish so much better after you run a marathon but would you high don't know what you do interpre billions in front of gender con for 45 minutes I think we I don't know what your requirements are forgetting free pizza but this
is actually a really fun little toy example that's someone actually put into the world and actually actually please data for us so that we can actually start working on thinking about measuring and predicting altruism with this idea that if you get pizza some gives you pick so that an altruistic act so where does this where does this example come from on Reddit there's a subtle and it's called a random acts of pizza of and essentially what this sub right it is is it's basically invites people to pay to come on and say hey you can make a request as you'll notice here there are some there is some structure for how a request happens which is pretty nice because an error of unstructured data having some structure to data is great expression using machine learning are so in that in line item 2 we have all right your request post and submit it remember to start the title with request so yeah there's some ideas here but essentially this is a subgradient of on Reddit where people can go and basically make requests for for free pizza and then the community will also or down vote and based on the rules of how this works and eventually when you have so many votes yet you and get a free pizza or someone might buy it for you so if you're hungry tonight and here you can do this tonight of some examples of
texture of free pizza goodness I really got a kick out of the of the 1st line I actually have money for the pizza but all I have is a 50 dollar bill and the delivery boys don't accept anything larger than a 20 dollar bill so that was actually the text of someone's request on another 1 another request that someone made was I've got the Torah in 1 hand and on the other a pizza pie so I guess there just like trying to get some excited like I'm ready almost there you have the blind people but as you can see that it these these kind of request they can be all over the place they can be kind of silly and some of them are actually little heartbreaking to read the so in
this little in this toy example the side data science competition website cable actually already had cleaned this dataset and made easily downloadable so it's probably very hard to see but on the right hand side is a snippet of somebody some blocks that represent the training data for this problem on you've got a lot of features in there including the request tax which is some examples of things I gave you you've guy last how the number of nodes you have user information of who requested it that it was requested things like that so that kind of example of your training data and then essentially what this problem does is it says hey I I want you given these 5 thousand 671 requests of which 994 of them are labeled as true and the rest are labeled as false I want you to write a novel machine learning model that can actually predict and tell me if when I get a new request coming in effect requesters can be successful or not so for machine learning problem this might be an example of a classifier I don't know maybe you're you've got a you want the out idea where you want to Arrow hacking away to all the free food in the world maybe this is your wavelength bringing machine learning to your application you have some idea of text data and you you got some that are historically labeled examples of things that were successful or not and then you can use that when you're constructing a model to help you make future predictions so the refrain this in the
language that we saw earlier from Tom Mitchell we have a task which is classifying a piece of data essentially the question is is a pizza request successful for the acid differently is that is is an act of altruism enough to our experience here is gonna be the labeled training data would essentially like a CSP which where you have a of a stadium and he does that Boolean representing true or false if that cost was successful or not and a performance measurement is the label correct that's pretty much a performance measurement because we're working with labeled data and you're actually gonna go ahead and take on summer training data if you have a label already you know just in check and the hate that actually successfully predict this thing are not so in the in example can machine learning maybe you have some kind of classifier problem like this you have your task you have your experience and I invite you you've got some notion of a performance measurement what
next OK so we've talked a little bit about what machine learning may look like for you it as an engineer what kind of decisions you have to think about a media he may be working with so could be text data it could be other things about the pizza request maybe you're looking for if something is i incorrectly capitalized in in the request maybe there's other characteristics of the data that you wanna look at beyond just the words that are in its well with all the data science
team on something to think about is that you've that kind of a few folks were working on these problems it's not just mere development is not just a developer's so I'm kind of borrowing this from IBM's UX personas on as applied engineering they've they've kind of think they felt like the idea that the differences between app developer data scientists and the data engineer so the data scientists that would be someone who's going to be going in there to doing a lot of the feature engineering there that they're the ones who and the trained on the different statistical methods and algorithms on in your data engineers and it was probably building the part the pipeline but the infrastructure around us are making this thing happened and the apple perhaps for the ones actually bringing the model and putting it into the application and and making it so a user can can actually use this feature maps so in in that world let's talk a little bit
about my science the so you may have seen this for this then graph then diagram of madness is the to color before where you know with them the idea that someone has all the skill sets to me is just a total fallacy the idea that you have mass that's subject-area expertise and computer science and that the unicorn which allegedly the data scientists now I can tell you that stifling uppercase are omitted in the kind of the context of my team I we've we've got a team that's been growing pretty quickly like I said about a year and some change so we've got for data scientists without someone who comes from natural language processing PhD who has been a long time in computer science and working with NLP and are we have folks who come from predictive analytics economics now we also have a person who came out of the post doctor in chemistry but then there is need alone soul suffer near and the team and I guess that I think I came from the platform engineering side of things and I had done a little data analysis and analytics in my time as a as an as a person in the professional world the we do also have some designated infrastructure support by largely that is still kind of in a real house so why is
that problem great we've got teach these words where the title of of data scientists there's mean we have some infrastructure for why do I care about that well I think this can
give you an I. D. on this so from the keynote that logic bonapartist it this year in Portland as he talks a lot about the variety of tools that are and scientific stack you guys psychic blurring you've got a lot of different ideas you can use like the degree you have tools like can does a lot of visualization tools and plotting tools and also just like the actual work of developing the model and executing you've got things like Japan notebooks this is a big world that you that people can pick up and select tools from
so that's pulled wise type on the place of people are going to fit in lot modelling work the thing I like to think about and why I get excited by the the science and Python is namely that it really is it really is a place where uh but it really is a place where we can kind of mass a lot of things together in the scientific community there's a lot of there's a lot of work for example in the astronomy fields using pipeline to do a lot of analytics and visualization on kind of again refraining with OJ upon + highlights pipe and act as a glue it plays well of other languages we've got our batteries included kind of idea here we don't have to go and umbrella proprietary coders go in working with data right from the get go on the on simple in dynamic and hasn't of beneath us well suited to scientists
so that's all fine and dandy but again on my side and the ones suffer engineer on my team with people who come from a variety of different academic disciplines and the kind of tools that they can use can strongly vary depending on the types of machine learning problems were working on so to kind of think about what is the model for my team or maybe what model for me as a person supporting a data science team on and then add modify of Professor Ralph Johnsons quote here and instead of saying before software can be read can be useful it must be reusable and as they explicitly perform machine learning can be usable it must be reusable we have a lot of different expertise we a lot of open source tools we have a lot of different kinds of problems people can use and I think it's a leading to it can be a little overwhelming and is kind of thing I wanna highlights so back that this idea of
machine learning on 1 thing that we have to think of a bit about is what the problems are an answer and what does that look like answering those problems so batch that example of the other at predicting if it and get a free pizza or not so we might have text data there but that could depending on machine learning come you working on you might have a variety of other data sources like you're integrating on and this kind of component of like shape in the data the format that you need and selecting the data and then maybe the data into the format that you need as well I can be quite expensive so we call this feature engineering the I this representation kind of on the left hand side is kind of a broadly simplistic idea of what may be a pipeline for a supervised learning approach to machine learning so you can see like me without data in our in our the in in a production databases and we have logs maybe we have on so examples of some locks maybe we have some metrics in which were kind of capturing how people are using the application maybe we've also got some proprietary data that we paid for the other can get quite a few series of things that were taking together and integrating and we're plotting together into new format so this idea here of getting the data merging it together and then that feature extraction and preprocessing that is going to be a lot of the work of our data science teams then we have I once we have data and in the format and and we selected the features at once we can then go ahead and apply a learning algorithm great From there were able to get a model and then using that model we can go and make predictions in example of our of our classifier window make it we can make predictions on future pieces of data and say hey that thing is successful or he no it isn't but that writing it here is reminding a lot of things together there's going to be different areas of expertise and depending on who is doing what work we need to think a little bit more critically about
that and so this gets into 1 really big question the way that application developers is used the data that we're collecting and are proprietary data 1st how I using the data vicinity hot science uh the data scientist perhaps themselves reading the data can be quite different and we've actually got in an entirely different kind of infrastructure that we need to start developing and building and the with that the question then ultimately becomes you know where is the handoff between data science and production what does that look like which a social contract the because the way that I'm just I'm collecting and shipping data the kind of data pipelines that were laying down are going to be of are going to be of importance the application developers poor responding to user facing features so let's break this down a
little bit the OK so in the kind of flows that we that I'm on going to be talking about which we talked a little bit about the gets and shape the data that's the 1st thing the 2nd example the 2nd step about is training the model on the date of 3rd we want pick the model we there was something like in Python job and then 4th we can go ahead and use it and make predictions on the so in this in this kind of example here using Python's ecosystem you might have some kind of script and this is quite quite simplified we you might have a script like what we see in the left hand side so in the Python find fixed on in the pipeline scientific stack we have a few learned why has to learn why not underflow as the learners try to woo ineffectual 19 years so you might have something like this where is saying OK we've got the example of the Peter we want we have data that we've been collecting and I want to be able to apply model and make predictions on its so we're using 6 site it learned we actually have all those algorithms built and so naive Bayes 1 1 variant of that is the multinomial Naïve Bayes and the flows and a kind of a book like this and in the Python code you may receive something to the extent of K but splitted that data into the training and the test on the training set and the test sets and we're going to go ahead and fit that multinomial model or whatever model you using it in a fit on the training data and essentially what that does on the training data is it starts to represent whatever features is selected so if you're doing something like me working with words perhaps you're turning that into a numerical vectors saying hey words that appear most often in the in request that you are successful for winning a pizza and here's the here's the kind of histogram count and the the ones that aren't successful in getting people here's here's a vectorized format of those workouts that the 1 example of feature there might be other features including and you can represent that it on in a numeric format but essentially when were fitting against historical data there's a process like that happening underneath once he stated on that historical data you can go aside and say hey I've now got a model that's ready to go ahead and be used and that's a really go ahead and jump out of the I the model in a pickled format so again that that kind of shaping of the data out like what kind of prep work and I can be doing I'm not doing a talk at very deep and psychic pipelines if you are curious about that there is a fantastic talk I recommend from paideia Chicago in August 2016 about 40 minutes the goes in and talks about different transformations you can use and things like that I have going to my slides but that are being said and you have your data you're going to have to do some kind of mounting evidence of it together getting it to work and you can use transformers from cited learned to actually get in the way you want so to think back to the idea of what might be an example of a transformation perhaps you have a bag of words with a lot of 80 there are what we call the stop words is a word that they provide a lot of our context perhaps maybe the frequency of these words can throw off on introduce bias into your model that says hey I and over bias over I overestimated this is likely going to be a successful requested so a transformer might be your bag into a words removing stop words you might also reading something like standing which is trading words like shopping shopping as the same word you can remove that I am gene but transformers like that you can write your own custom once you can do that with the 2nd 1 pipeline operators the so when were back into this flow we've got training we've got training examples we've got the forming of the of the data in the informant needs remembering to go had high that that learning algorithm the what's pretty nifty to about like it learn is you've got a variety of things in there so few are kind of new to machine learning there is a lot of ease of use for you to go ahead and and start getting to work in a pretty quickly it on so I would encourage you to go and explore a lot of that and you do have some metrics that are also built into psychic learnable will see that a little bit of and but for the context of the of the machine learning that I'm talking about that I work with this is true historically flow we've got supervised learning we have some previous data that's and formatted in some way will use cite it learned applied transformations get the format that we need them we can go ahead and that the model on it will take that model from such such it learned and then we can dump that are using job led in a apical format and that thing is ready to go the
cool OK so I was talking a little bit about earlier about reusability so we got the slowly talked little bit about what the code may look like for for the that if you're using python 2 D machine learning explicitly following the example of the supervised Naive Bayes example soul reproducibility doesn't matter so the engineer for that can I
really think social media data is just so fantastic other than that in you have to develop a machine learning model that says hey the sentiment of this tree is positive or it's like a warm good feeling you know hobby a slice and dice that you have like separating negative from positive what here's a thing a mildews are part of text data the the idea of having to write I think it's called tiers of joy people use that so differently if you have some folks who think that that's actually like someone who's like sneezing crying you have people use that and maybe think that it's like a very positive exciting thing so the example here from the we i it says I'm laughing so hard tears of joy tears of joy tears of joy lots more tears of joy is on have tag on mentally dating Justin Bieber and that shirt says single taken and sleeping Justin Bieber so if I ask you is this a positive or a negative thing I'm very curious what you would say it is I would say not so positive of not a huge Justin Bieber program like I would think that that here the joys tears of is like all the destruction I don't care if this is not a positive tweets another really fun 1 that I I've come across a 4 is the purple heart energy so you know I'm super sad that prince is no longer with us but princes purple is the color of Prince and what if you're looking at me please the Purple Hearts pressure on anniversary of princes passing which I believe was April this year armed with the 1 year anniversary you start getting a lot of purple hearts and so let's look at this to be the printer state has announced on at 10 times new purple hue named in honor of Princess famous love symbols so to me I'd like all that's so cool remembering Princeton were empowering friends to other people they might still to be you know watching have Purple Rain adjust utterly destroyed by things so that you know we have all these examples of kind of inconsistencies right reproducibility does matter if I had a machine learning model that's predicting the sentiment of something ideally we would want it to be consistent right yeah I would hope so it but there's a lot of there's a lot of context there and so this is where things get a little tricky and many gonna start on have tagged LOL Saab because that is just so that's a silly 1 so the idea here
about data and Candida governance who owns the data who's doing what how are we going to get consistent results so in the pipeline operator of things I wanna be a little different system and then the black boxes I knew that I've got the black box that shapes data and does things at the black box that's that's at a model and then have a black box that can go hand the call to make prediction right so I'm going to say as an engineer what I care about is developing tools that will help me get this reproducibility so that's where I'm going to start
saying doctor is fantastic and also I just really like doctors imagery and it's very excited so containers on there is there is a very often talk later today on Cuban and is also talked a thing like if they I think the topic and and Jane going through a net is which just fantastic but Gasso containers but look into containers and talk a little bit about it
so the doctor and get some probably most of you have used but it so it's good to get a refresher if you have nots on so you can think of doctor is basically a big executable tarball that has an explicit format so for me is an engineer and might be like so the code that that action does the thing of any kind of libraries I need to help me with the code to be the thing immediately system tools I need cool well here's a fun gotcha in data science that might also include it this
data because we don't want this weird inconsistency around the Purple Heart Purple Hearts the a little solve whatever is the latest and greatest thing that someone's introduced into the
world so that data is also probably going to be a big thing if you talk about supervised learning problems so put back a can also you know when we think of Dr. maybe that terrible can also learn how to get that training data down so that we introduce new versions of models can retrain on the same thing because this consistency is good on and the that again you know why use doctor I just have to steal from from healthy Hightower who eloquently said in the 1st row pipeline is that you don't use the system installed version of Python I mean I am sure many of us have struggled new having on gender girls quite a few times telling everyone do not use the the Python that's installed on your machine will alone will solve all they have took so obviously as we know and as we know setting products of can be difficult but then introduced all the complications of machine learning like the data that we're using the correct order of transformations that we have to do the other thing that sir to become more complicated we talk about machine learning problems so for the doctor and there's an interface what's really cool is allows us to make container with basically takes a snapshot of how the code works at a given moment and then we can just put it somewhere and then we could check it out later for while so we do this have the doctor filed and essentially what the doctor filed does is you have steps in your code and what's really nifty is that you have the same procedure your in your doctor file I time after time you won't need to re download everything is essentially caching layers so only you make changes in your data file will have to go and re download things which is great were now making a little easier was set up so when we write the darker file work and then the 2nd step is to build the doctor image which would look something like builds a doctor built the top this than in the model predicting altruism given a type of latest and then if I wanted to run that Dr. container it might look like from the command line Dr. Randy attached on nearing of associating port 8 8 8 8 2 8 8 8 8 and you mountable data directory volume
so as an example of a doctor filed sense what this is doing top to bottom this is an example maybe what I would give a data scientist for them to go ahead and start working on their stuff so that I can say OK now save things so that I can check things out when you give me the thumbs up that you've got a model that you wanted to download and were essentially from the top to the bottom what this is doing its as user Python 3 image go ahead at some users and then go ahead and mount the Data Directory volume on to my doctor image and then go ahead install the requirements and then I have this entry point which basically says up Jupiter notebook that's what I provide pretty straightforward
so with that of files we can easily
start controlling when changes happen because you can build a snapshot of what the code what the data with all that looked like at a point in time so I'm a big fan of using the multiple data directories essentially what that allows us to do it takes data on a directory on your local system and then it's going to go ahead and now that into the image so like I said we want reproducible results so far I have 1 scientists who found a really cool dataset and maybe they want to work on that but they wanna use the same model well they can go ahead and build a new image and maybe bringing it with this is the name of the of the training data that I had so in thinking about
0 how I can get my data scientists to work essentially what I would argue for them to do is go ahead work I want you to use the stopper filed we're gonna have a amount data directory which includes you give notebook which is going to have your exploratory code to again back step process psychic learned pick pick the naive Bayes or whatever classifier you're using then go ahead and to transform the model transform the data using those transformers as the data scientist selects and pickle the model and building jump into the her image they conducted into the dark image that's fine where that where that where that model alternately wind-up then at least I know I can go check out the DARPA image and I've got the training code that goes with that at the book that is ideally with some naming conventions it explicitly tell him when it was last touched who owned and things like that but none of them basically creating kind of abstractions for them to work in where they control the modeling side I can check out the thing that they're working on and continued on my merry way so if I'm using a doctor container what I could do is use 5 in Python we've got a darker are module which allows us to create an image so if I were to stand up a darker M points in a in a dingo API maybe the URL looks something like this great image name of model go have pre image on in that that that will kind of go ahead and goes to the procedure saying hey and then read the darker file and then you go ahead and build this image and I'm going to go ahead and stuff it somewhere and what I wanna return back is basically you were out aware of where that doctor image lives for context there is a demo also here on my repository so an additional slides you can check that out but essential that agenda chronic comes in we can now basically stand up and and point use Dr. abstract away with the need to do and then allow the allow the designed to say okane hit endpoint it's called predicting altruism and I'm gonna go ahead and ask them to to go ahead and build a new snapshot of the work that I'm working on so the image of a notebook in might have something like this this example here of the clean text tokenizer kind of what I was talking about before breaking up words into a bag of words in the maybe doing whatever other transformers you need but in the jeep another book that they're working and on essentially when we build that no 1 we build that up image were then able to in Dr. image jump people model and have it lives somewhere so
essentially what we can do it with the death of the dark gray PI is basically built in this doctor workflow word data scientist as the working on their on the other on the duper notebook when they're ready to to go ahead and save it is supposed to gender and point managing and create an image and that images can be tagged with a mutually agreeable model naming convention that we agree on such that when Matthew asks me hey I need I need to know about this model I could say OK well the last 1 that was updated on to doctor is here and here's the URL you can use to go ahead and check it out with that kind of structured way those processes from people
so the big thing that I did not talk about to to keep in mind is that it's not just that we want to order kind the building of these things but we wanna understand how models will do over time right that's the big thing here that I think is kind elephant in the room is we want you to store analytics as well so criminal my teams working on is where using other django admin to start bringing in some of the analytics that begin with like it learns on the right hand side you'll see there's a chart of false positive vs. false negative and see if we can start lifting some of those metrics out so that when we train against that historical I training set we now have consistency like OK model during 1 here's the results of false positive false negative model very number 2 you here's the results such that when we start looking at me to what's the best model because surface than those in the Django admin view some other things kind of think about too is there are and some of the things that thing but you is a we having the traffic talking about Cuban eddies in this talk but communities is a great way to stand up an image and stand behind and so again taken that containers technology to the next step and allowing people then who are on the application developers to go ahead and opinion and we criminative can also allows to say to get to work and start doing that so if you are curious about the when it even make a plug for that topic for 10 and also gender is a great framework in which to use you can build upon some of the great robust of libraries that are out there like the Django admin in use in some the other visualization tools but there's also other web frameworks maybe you might wanna think about so I know that a fact today there's a generous blast talk I I have also used Balkan as well as I think it really depends on what your team needs but I think gender is a really good place to start that really robust supported integration with a lot of these tools we talked about the so if you're curious about
knowing what next there's a lot of things as we go next and I really enjoy Rob stories talk Britain pipe on the JVM odd because that's another thing maybe we prototype a model and Python but we wanna make use of the JVM is there something we can do that without we do this for the flow we just might be to transform that people model into a different into into a different format how can we do that so there is there's some discussion there on if you wanna learn more about the site of the 2nd 1 pipelines that link to that's on but also the last not least I do have to repositories 1 that allows you to work with the mountable data volumes with a ready full-fledged Peter altruism model that you can use and do what you want to your heart's content to stop playing with the men also do have it set up an agenda wrapper which is pretty great so that all being
said our what do you see an image her clip art after I don't know I'll say I hope we see lots of free people so thank you so much money Lorena I'm excited yielding hits the energy of the questions of Iran the world so fewer and
things was so if that
Rechenschieber
Videospiel
Maschinencode
Twitter <Softwareplattform>
Affine Varietät
Kontrollstruktur
Maschinelles Lernen
Binder <Informatik>
Computeranimation
Telekommunikation
Maschinencode
Bit
Elektronische Publikation
Gewicht <Mathematik>
Affine Varietät
Dokumentenserver
Wasserdampftafel
Mathematisierung
Versionsverwaltung
Raum-Zeit
Computeranimation
Eins
Data Mining
Dienst <Informatik>
Funktion <Mathematik>
Notebook-Computer
ATM
Sollkonzept
Metropolitan area network
Schnittstelle
Lineares Funktional
Bit
Punkt
Gewicht <Mathematik>
Affine Varietät
Formale Sprache
Mathematisierung
Social Engineering <Sicherheit>
Abgeschlossene Menge
Systemaufruf
Maschinelles Lernen
Systemplattform
Lie-Gruppe
Computeranimation
Data Mining
Geschlecht <Mathematik>
Mereologie
Wort <Informatik>
Immersion <Topologie>
Algorithmische Lerntheorie
Optimierung
Bildgebendes Verfahren
Normalvektor
Beobachtungsstudie
Algorithmus
Konstruktor <Informatik>
Bit
Selbst organisierendes System
Formale Sprache
Gebäude <Mathematik>
Prognostik
Computer
Maschinelles Lernen
Fokalpunkt
Computeranimation
Roboter
Algorithmus
Prognoseverfahren
Zerfällungskörper
Wort <Informatik>
Algorithmische Lerntheorie
Mustererkennung
Neuronales Netz
Beobachtungsstudie
Bit
Affine Varietät
Rahmenproblem
Quader
Formale Sprache
Gebäude <Mathematik>
Computer
Maschinelles Lernen
Analytische Menge
Computeranimation
Task
Task
Datenmanagement
Minimalgrad
Algorithmus
BAYES
Hypermedia
Algorithmische Lerntheorie
Optimierung
Informatik
Einflussgröße
Chipkarte
Abstimmung <Frequenz>
Freeware
Indexberechnung
Subdifferential
Schlussregel
Zeiger <Informatik>
Computeranimation
Freeware
Arithmetischer Ausdruck
Einheit <Mathematik>
Geschlecht <Mathematik>
Rechter Winkel
Klon <Mathematik>
Randomisierung
Wort <Informatik>
Hill-Differentialgleichung
Quadratzahl
Datenstruktur
Algorithmische Lerntheorie
Gerade
Fehlermeldung
Soundverarbeitung
Web Site
Wellenpaket
Freeware
Güte der Anpassung
Zahlenbereich
Kartesische Koordinaten
p-Block
Computeranimation
Textur-Mapping
Freeware
Knotenmenge
Zufallszahlen
Prognoseverfahren
Rechter Winkel
Pi <Zahl>
Zeitrichtung
Information
Sollkonzept
Algorithmische Lerntheorie
Materiewelle
Gerade
Bit
Wellenpaket
Atomarität <Informatik>
Formale Sprache
Programmverifikation
Computeranimation
Entscheidungstheorie
Task
Task
Hypermedia
Wort <Informatik>
Algorithmische Lerntheorie
Charakteristisches Polynom
Einflussgröße
Bit
Subtraktion
Datenanalyse
Mathematisierung
Natürliche Zahl
Analytische Menge
Systemplattform
Computeranimation
Eins
Algorithmus
Information Engineering
Prognoseverfahren
Softwareentwickler
Informatik
Prognoseverfahren
App <Programm>
Softwareentwickler
Graph
Systemplattform
Ruhmasse
Statistische Analyse
Sollkonzept
Kontextbezogenes System
Natürliche Sprache
Menge
Mapping <Computergraphik>
Software
Diagramm
Formale Sprache
Mereologie
Kantenfärbung
Sollkonzept
Subtraktion
Affine Varietät
Maschinelles Lernen
Programmierumgebung
Mathematische Logik
Computeranimation
Datenhaltung
Keller <Informatik>
Minimalgrad
Notebook-Computer
Maschinencode
Visualisierung
Wort <Informatik>
Sollkonzept
Objekt <Kategorie>
SCI <Informatik>
Subtraktion
Affine Varietät
Open Source
Formale Sprache
Element <Gruppentheorie>
Ruhmasse
Maschinelles Lernen
Analytische Menge
Marketinginformationssystem
Computeranimation
Eins
Software
Datenfeld
Modul <Datentyp>
Software
Datentyp
Visualisierung
Sollkonzept
Algorithmische Lerntheorie
Bit
Subtraktion
Selbstrepräsentation
Maschinelles Lernen
Kartesische Koordinaten
Login
Steuerwerk
Computeranimation
Überwachtes Lernen
Physikalisches System
Prognoseverfahren
Bildschirmfenster
Kontrollstruktur
Zusammenhängender Graph
Algorithmische Lerntheorie
Softwareentwickler
Meta-Tag
Shape <Informatik>
Linienelement
Affine Varietät
Datenhaltung
Reihe
Störungstheorie
Quellcode
Biprodukt
Design by Contract
Eingebettetes System
Flächeninhalt
Dateiformat
Sollkonzept
Stapelverarbeitung
Shape <Informatik>
Web Site
Maschinencode
Bit
Wellenpaket
Regulärer Ausdruck
Maschinelles Lernen
Transformation <Mathematik>
Zählen
Computeranimation
Eins
Polynomialverteilung
Algorithmus
Prognoseverfahren
Prozess <Informatik>
PERM <Computer>
Skript <Programm>
Maßerweiterung
Algorithmische Lerntheorie
Softwaretest
Nichtlinearer Operator
Affine Varietät
Linienelement
Benutzerfreundlichkeit
Vektorraum
Kontextbezogenes System
Frequenz
Datenfluss
Rechenschieber
Programmfehler
Histogramm
Menge
Witt-Algebra
ATM
Dateiformat
Wort <Informatik>
Sollkonzept
Fitnessfunktion
Resultante
Blackbox
Program Slicing
Symboltabelle
PRINCE2
Netzwerktopologie
Negative Zahl
Prognoseverfahren
Optimierung
Widerspruchsfreiheit
NP-hartes Problem
Nichtlinearer Operator
Systemaufruf
Symboltabelle
Physikalisches System
Kontextbezogenes System
Energiedichte
Druckverlauf
Twitter <Softwareplattform>
Rechter Winkel
Hypermedia
Mereologie
Multi-Tier-Architektur
Kantenfärbung
PRINCE2
Sollkonzept
Aggregatzustand
Physikalisches System
Maschinencode
Bit
Gruppenoperation
Microsoft dot net
Programmbibliothek
Dateiformat
Physikalisches System
Inverter <Schaltung>
Computeranimation
Maschinencode
Wellenpaket
Momentenproblem
Mathematisierung
Versionsverwaltung
Lineares Gleichungssystem
Symboltabelle
Transformation <Mathematik>
Computeranimation
Überwachtes Lernen
PRINCE2
Physikalisches System
Datensatz
Datentyp
Spezifisches Volumen
Algorithmische Lerntheorie
Bildgebendes Verfahren
Widerspruchsfreiheit
Schnittstelle
Dean-Zahl
Physikalisches System
Biprodukt
Elektronische Publikation
Algorithmische Programmiersprache
Geschlecht <Mathematik>
Ordnung <Mathematik>
Sollkonzept
Versionsverwaltung
Verzeichnisdienst
Punkt
Thumbnail
Notebook-Computer
Minimum
Spezifisches Volumen
Sollkonzept
Elektronische Publikation
Sollkonzept
Bildgebendes Verfahren
Computeranimation
Resultante
Maschinencode
Demo <Programm>
Wellenpaket
Punkt
Dezimalbruch
Transformation <Mathematik>
Information
Computeranimation
Spezifisches Volumen
Homepage
Wechselsprung
Multiplikation
Keilförmige Anordnung
Fächer <Mathematik>
Notebook-Computer
Maschinencode
Volumen
Neunzehn
Bildgebendes Verfahren
Demo <Programm>
Software Development Kit
Dokumentenserver
Abstraktionsebene
Gebäude <Mathematik>
Token-Ring
Physikalisches System
Sollkonzept
Kontextbezogenes System
Elektronische Publikation
Systemaufruf
Algorithmische Programmiersprache
Rechenschieber
SLAM-Verfahren
Einheit <Mathematik>
Ganze Funktion
Notebook-Computer
Wort <Informatik>
Sollkonzept
Verzeichnisdienst
Resultante
Wellenpaket
Ortsoperator
Desintegration <Mathematik>
Flächentheorie
Zahlenbereich
Kartesische Koordinaten
Analytische Menge
Framework <Informatik>
Computeranimation
Negative Zahl
Benutzerbeteiligung
Flächentheorie
Notebook-Computer
Programmbibliothek
Visualisierung
Softwareentwickler
Datenstruktur
Widerspruchsfreiheit
Bildgebendes Verfahren
Sichtenkonzept
Linienelement
Gebäude <Mathematik>
Systemverwaltung
Sollkonzept
Integral
Menge
Rechter Winkel
Geschlecht <Mathematik>
Wort <Informatik>
Sollkonzept
Versionsverwaltung
Prozessautomation
Prognoseverfahren
Web Site
Dokumentenserver
Freeware
Gebäude <Mathematik>
Datenfluss
Menge
Computeranimation
Energiedichte
Wrapper <Programmierung>
Dateiformat
Spezifisches Volumen
Sollkonzept
Brennen <Datenverarbeitung>
Bildgebendes Verfahren
Prototyping

Metadaten

Formale Metadaten

Titel Using Django, Docker, and Scikit-learn to Bootstrap Your Machine Learning Project
Serientitel DjangoCon US 2017
Teil 21
Anzahl der Teile 48
Autor Mesa, Lorena
Mitwirkende Confreaks, LLC
Lizenz CC-Namensnennung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben.
DOI 10.5446/33171
Herausgeber DjangoCon US
Erscheinungsjahr 2017
Sprache Englisch

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract Reproducible results can be the bane of a data engineer or data scientist’s existence. Perhaps a data scientist prototyped a model some months ago, tabled the project, only to return to it today. It’s now when they notice the inaccurate or lack of documentation in the feature engineering process. No one wins in that scenario. In this talk we’ll walk through how you can use Django to spin up a Docker container to handle the feature engineering required for a machine learning project and spit out a pickled model. From the version controlled Docker container we can version our models, store them as needed and use scikit-learn to generate predictions moving forward. Django will allow us to easily bootstrap a machine learning project removing the downtown required to setup a project and permit us to move quickly to having a model ready for exploration and ultimately production. Machine learning done a bit easier? Yes please!

Ähnliche Filme

Loading...