Productionizing your ML code seamlessly
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 132 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/44929 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
EuroPython 201876 / 132
2
3
7
8
10
14
15
19
22
27
29
30
31
34
35
41
44
54
55
56
58
59
61
66
74
77
78
80
81
85
87
91
93
96
98
103
104
105
109
110
111
113
115
116
118
120
121
122
123
125
127
128
129
130
131
132
00:00
Machine codeSoftwareLocal ringMobile WebSoftware developerUniqueness quantificationService (economics)Data modelLaptopPoint (geometry)Event horizonRankingProduct (business)DeterminantVirtual machineSystem programmingStress (mechanics)Presentation of a groupSampling (music)Mathematical modelSource codePerformance appraisalPredictionScale (map)ProgrammschleifeFeedbackMetric systemPredictabilityCore dumpSet (mathematics)Product (business)Object (grammar)Endliche ModelltheorieMenu (computing)BitMoment (mathematics)Source codeVideo gameMultiplication signMereologyPoint (geometry)Wave packetCellular automatonWeb pageTransformation (genetics)Mathematical analysisResultantMusical ensembleLevel (video gaming)DataflowParameter (computer programming)Machine learningFeedbackSeries (mathematics)Different (Kate Ryan album)CircleConstraint (mathematics)Machine codeView (database)Virtual machineLaptopBasis <Mathematik>CASE <Informatik>Latent heatSoftware testingModule (mathematics)Focus (optics)Operator (mathematics)Functional (mathematics)Scaling (geometry)Uniqueness quantificationDimensional analysisStrategy gameLocal ringoutputPerformance appraisalMathematical modelMetric systemLoginMathematicsPhysical systemCategory of beingNumberService (economics)WebsiteCurveProjective planeType theorySequelAreaSlide ruleInsertion lossPresentation of a groupRight angleDisk read-and-write headTwitterComputer animation
09:54
TrailMetric systemHypothesisMathematical modelMachine codeMaschinenbau KielMerkmalsextraktionExecution unitSoftware testingPersonal digital assistantWritingPerformance appraisalProduct (business)PredictionSource codeRevision controlSampling (music)Error messageTime evolutionWordEndliche ModelltheorieMereologyPredictabilityFlow separationTrailVideo gameRandomizationMachine codeMultiplication signSoftware testingBasis <Mathematik>Different (Kate Ryan album)Error messageNP-hardProjective planeMathematicsProduct (business)Physical systemFunctional (mathematics)Wave packetCASE <Informatik>Computer fileMathematical analysisSuite (music)Linear regressionClassical physicsCommitment schemeParameter (computer programming)Set (mathematics)HypercubeConfidence intervalRevision controlNumberPhase transitionLoginAddress spaceEmailSocial classStatisticsObject (grammar)OvalLine (geometry)Mathematical modelVirtual realityMachine learningHypothesisExecution unitTotal S.A.Performance appraisalWritingGroup actionSoftware engineeringDirected graphService (economics)Right angleProcess (computing)Goodness of fitPoint (geometry)BitSoftware developerQuery languageSoftwareWeb pageComputer animation
19:48
Office suiteValidity (statistics)Software engineeringLogic gateSoftware repositorySoftware testingoutputMultiplication signMathematical modelRevision controlData storage deviceSoftware developerVirtual machineEndliche ModelltheorieComputer fileFunction (mathematics)Right angleIntrusion detection systemLinear regressionResultantMereologyTrailProduct (business)Machine learningLatent heatComputer animation
22:30
PredictionTrailMachine codeError messageMetric systemPerformance appraisalMathematical modelMerkmalsextraktionSource codeMachine codeBitLambda calculusHand fanSet (mathematics)Endliche ModelltheorieLinear regressionNumberModule (mathematics)Loop (music)Perturbation theorySoftware testingData structurePersonal identification numberRight angleGoodness of fitAlgorithmSocial classBefehlsprozessorDirection (geometry)Instance (computer science)Multiplication signTransport Layer SecurityStability theorySoftware repositoryLaptopMathematicsVirtual machineProduct (business)Metric systemWordSoftware engineeringMathematical modelBlogRegular graphLoginReal numberPattern languageRepository (publishing)MereologySoftwareCartesian coordinate systemWeb pageCollaborationismSquare number3 (number)CuboidOcean currentNetwork topologyUnit testingWritingService (economics)Table (information)Revision controlComputer animation
Transcript: English(auto-generated)
00:04
Hi, thank you for being here. So we I'm Loris that was just said We're here to talk together about how to production eyes your ML codes seamlessly as was quickly mentioned I work for Yelp and this is not working
00:21
nice No neat Yes, I'm over here yes, awesome Yelp is all about connecting people with great local businesses. That's the core of our mission You probably have used the site especially here in the UK if you type things around head of milk
00:42
You might end up looking for this. That's what you will do find restaurants Your local plumber a moving company to help you move your code seamlessly into production anything that is needed Kinds just to give you an idea of the scale at which Yelp work We have a lot of reviews a very high number of unique visitors
01:02
The month's quite a big team lots of service a lot of code and Which means a lot of people working on data? So we have learned quite some ideas on how to make our life a bit easier putting the models into production What's on the menu for today well, it's kind of like presenting
01:22
What does that mean like putting a model into production? That's probably the least explainator self-explanatory sentence ever made and then I will Kind of give some tick and trips to actually make your life easier Before we actually go into the deep of the subject. I just want to kind of just take a moment so we all kind of agree on what I'm talking about and what's
01:44
Kind of all of this fits into I think every ML project in Python are started in a notebook if it didn't probably something Weird was happening and this notebook Started you started writing this notebook because someone gave you a data set and the questions and you needed to answer it and
02:01
In the way, that's kind of the core of it. You don't do machine learning just to do machine learning You're trying to predict a desirable behavior. You're trying to recommend you're trying to detect you're doing Something you have an objective when you start doing machine learning and your notebook does something Whether your notebook is simple your features are already all nice and you just apply a few transformation
02:23
You do train your model do some feature analysis and maybe checks that your model actually didn't train in a too crazy a manner or You end up with Pages and pages of SQL queries that perform feature extractions and you have a complicated model and you have everything they serve to help
02:40
It doesn't really matter in a way because once Brought you to the stage where you think you can bring this to production is at the end You had a result you had some things that made you think. Yep. Yep. I think I can do it There's a problem. I can crack it but now You cracked it in a way once and you want to crack it on a regular manner and you want to be able to generate
03:03
train Regularly manage make sure that every day your model is not too bad And then your model produce some things that you want to use and you need to deliver this something with IV predictions trend to the final users and Finally kind of hey, I wanted to do something right at the beginning now that everything is done
03:25
Is it still happening? So right now you are the first step and I will show this probably over shown Schematics from hidden technical depth in machine learning systems that Google presented at NIPS to three years ago and Right now you're here this small
03:42
Very black circle which I put in red because it's even really hard to see and you need to interface yourself with all the rest of what your life systems are so now In a way, what does running an ML model in production involve? Well, it's kind of putting any other piece of code in production in a way. There is not so much difference
04:06
You're going to interface yourself with the surrounding infrastructure And the goal is to run from something that runs under your benevolent supervision Skipping the few cells which actually don't run correctly in your notebook to something that runs every day
04:21
Tells you when it's wrong Perfectly and probably even after you stopped looking at it Great. So as you might start to guess I'm not going to talk too much about tooling. That's really not the point This present is this conference was full of great people showing you awesome tools to do everything. I'm all going to focus on
04:44
Actually, what does it mean like what are the things how you think about decomposing the problem? Putting things into production into a series of step and questions that you're should be asking yourself and answering to arrive to it probably the answer to many of these questions is use airflow, but That's another topic
05:04
Great so For sake of arguments and as I'm talking generality, let's just Agree all together on a simplified view of what the pipeline is So first you have data sources. This can be many many different things you perform some sampling on it
05:21
Maybe or not You extract some features that you have defined probably that's your notebook told you that this set of feature was working really well I should definitely use that train your model you evaluate that everything went well You have a model rinse and repeat for production and then you're loading this prediction you have obtained into your product kind of fair
05:42
Why does this why is this useful? Well, actually probably your notebook performs exactly all of this operation But what this tells you is which piece should go together Which piece should be one function or one Python module and be tested together because it makes sense together right tests
06:03
Cool let's kind of focus on specific parts of this pipeline The first part I want to check on is the data sources even though I'm not going to talk too much about it It can be s3 logs s3 logs anything redshift my sequel Postgres something I didn't think about when I was writing these slides. It's likely to be changing regularly
06:23
You have new data coming in on a regular basis and how your system in just the data is Just admitted for the purpose of this talk I don't go into detail You might also have noticed especially if you have already worked with models before that. This looks a lot like a Kind of specific type of training which is everything happens offline. You have all the data you need at all time
06:46
You have offline training offline predictions Things with online prediction or online training would not be that different. It's mostly constrained on the data sources I will stay in the simple case Cool so first thing first
07:02
You need to update your model on a regular basis. And now the question is Regular basis Does it mean so that's up to you in a way? It's how often does your data change enough that your model needs to be changed to? The other point is what happens when things go wrong. How do you rerun your pipeline?
07:24
What happens if your data some data is missing do you pick all data and you fill it in maybe if you? Maybe it's not worth it. You could just reuse an old model or any other strategy You can think about the other one is the scale which is how many?
07:41
Prediction or how many what size is my input training? How long does this thing should take? This is what allow you to think of how you should dimension the infrastructure that all of this is running on We're talking about model we're talking about failure very often when people write code and they sing failures They think oh, I got the trace back from Python
08:00
But it's not exactly the only way a model training can be failed and that's why you have an evaluation steps that I really want to dig a bit deeper into and The first some questions you could be asking yourself is a does the evaluation metric I'm using actually reflects the problem I'm trying to solve and I think we all use like things like log loss or
08:23
area under rocker for other kinds of problem and The question and I want to tie it back to what I was saying is does it solve your problem? Maybe maybe not think about it some functions are very good mathematical property, but they might not represent what you're actually trying to move last part when you invite your model
08:41
Kind of think about which feature are used because this is the point when you can look as if there is feedback loop Your model doesn't just run on its own now It's generating prediction, which means is affecting probably how your data is generated and if in time You see things like oh your model just rely on one feature Maybe this one feature is actually your model just reproducing what he was predicting before so be aware
09:05
This is a time when you could say Actually, this training is failed because the model doesn't behave the way it should be Now, let's go to the prediction side of the schema questions are exactly the same what happens when things fail how often does this happen and
09:25
How many predictions should I be generated every day? And the last thing is which is a bit Saver is how the prediction go are used and how are they using your product? And that's what I would want to kind of push into right now
09:41
I've said nothing about in which all those things should be made. I just said that's the whole problem deal with it Actually, is this probably what you might want to start with how are the predictions using your product because you have predictions you had Them once and so if you can already start using these just to test It means you can see if you're actually successful or not
10:02
And that's probably the last thing which is how are you measuring success? How you say I did my job It's right project over. Thank you very much The first thing is you need to track the metric business metric yet You're trying to move you'd need to go back to your original problem and be sure that you can actually track your model is doing Something and you need to test it confront it to reality and you might not get it right from the get-go
10:26
in which case you should test new versions against old versions against probably status quo and the last part is Measuring success is always very easy. You could say yeah, I'm going to do this and it's well story time I work for a team called base gross and our main objective is to get people to create a special kind of account
10:44
Which is tied to a business so people can manage it One of the good way we found to do that would be to show people a little pop-up But if we show it to everyone, it's kind of not really working. So we thought hey, we're going to start predicting Which people would be actually likely to be owners and Suzanne's it would create the accounts this special business account and
11:06
We show them the pop-up so this ugly design if we draw the pop-up or not and then we move forward And so that's what we did. We built as a whole model. We trained to predict if someone was Potentially an owner or not We showed them the pop-up and all the time like 94 percent of the time they would create the account immediately afterward
11:25
It was great. We had great numbers because we're measuring our success by is it clicked the little button? They clicked on the pop-up were showing them and then they were creating the account But actually is a number of Total big number of account created didn't go up at all
11:45
It's that so we looked again and actually what our model was Doing is it was predicting what would happen whether we did something about it or not So we kind of checked that we created all that set of people that we could have shown a little
12:03
Pop-up too, and we didn't just to test that our model was behaving correctly and They were creating almost as much Account as people were showing them to but a bit less. So we're still happy. He was still working still worth investing in it Still it's really hard to get that wrong. It's really worth spending some times thinking this through
12:25
I've already kind of sink smoothly into a transition with stories Tip and tricks tips and tricks Sorry So start with general advice. You might feel like this is something you might have seen into putting your service into production
12:43
Yeah, good reason for as I was saying ML code is code Use containers Docker Kubernetes what whatever you want containers are great virtual environment even more awesome Try to spend some time persisting your work whether with version control with
13:01
Persist everything you do lose the logs everything you want. You might want to be able to look back at it in two weeks A month, maybe not a year. You might want to have the TTL to that the last two points are Maybe a bit less Common, it's use the production technology from the get-go Story time again. We had data scientists. They made a lot of effort to try to figure out how
13:24
Search and all lots of pages were related. It was all done with redshift It was working in redshift because redshift didn't have all the data set and then I had to write 1000 line of SQL query into sparks is to three months. This is a lost of time for everyone
13:42
Spark is probably even you easier to use an SQL if the production technology are widely available Even if it's an extra cost It's actually saving development time most of the time and the last part is if you're working in a company and you're doing ML models It's probably already a lot of things happening. You probably already have a software that runs software in a regular basis
14:03
Just don't reinvent the wheel Cool. Now, let's dig into several parts Might schema was not that great because it might have led you to believe that these two things were different They're not you should not have to piece of code for future extraction you should apart from label
14:21
the suitcase Use this should be unit tested you can use things like hypothesis to generate a bunch of random data to taste all of your edge cases and Don't write SQL. This is making people life hard write everything as code that can be tested unit tested mostly
14:41
Now on the training I'm going back to evaluation again. Actually, I might have said This in previous part, but just to put emphasis on it perform feature important analysis keep it somewhere so you can know when your model actually goes out of way if you had done any piece of codes that to just check that your data set was right or that you
15:02
Anything you had done basically in your research phase that gave you confidence that your model was good You should still keep it implement it and run it regularly This is what going to keep make sure that works as the assumption you've made with the data set stay true in time Also classics have a small sanity check
15:21
sanity check test suit With just a small set of data just to be sure everything is running while you're developing not breaking production with the push This is bad Now to all the things advice First one is log all the things you want to know what's happening with your pipeline
15:41
It's not running under your supervision Get your get it to log everything like the sampling you had like some ideas on how the class was selected What are the things like it's feature extraction log everything that happens how many features were extracted? Maybe some small statistics about the features you have a problem. You can just look at your log You don't have to recalculate everything model training log everything that happens evaluation
16:02
We just talked about it log feature importance this kind of things Log log log log log Look and in your product log when you're actually using your prediction What's usage is made of it everything every time you're actually doing something you want logs you want to know about it this is how you measure things second one is version all the things and
16:22
Yeah, that's how you keep track of change Feature extraction you might hide remove features write different functions maybe with different names Also, if you want to persist the features that I extracted before the model training in case the model training fails same thing if you have A file which is written with the days of features were extracted in which version of your feature extraction was actually performed
16:43
It's much easier to go back and understand what happened if your model has a version it can be two of several Like just a git commit. It could be semantic versioning. It can be both. It's great You want to change which go from logistic regression to each abuse bumps aversion You're changing the hyper parameters bumps aversion
17:01
There is nothing more frustrating than to know that there was a model that was generated and you can't remember how or know how it was done Plus if you have a version and you have login It's really really easy to know which prediction by generated by what especially when they are used which facilitates you evaluating success and doing experiments This is basic data traceability
17:22
So now with all of this we can think of maybe some general ideas on how to monitor the pipeline as a whole Keeping track of the number of predictions generated is a good way and prediction used to This tells you which part of your things could be broken if it's not things
17:42
Keeping track of trimmings to be able to see which part of your code starts to become slower and why that's also very important Especially if it has been running for one year, no problem and slowly crept up It's very different from it was running really well and suddenly the amount of time it takes doubles Alerts on errors in your pipeline codes
18:01
It can go as simply as send me an email at this email address if you don't have an alerting system Set up otherwise set one. It's really practical and kind of as the last line of defense if anything else fails Alert on the metric you were trying to move you had the project at the beginning and this all ML things is all production all this effort don't live in void. They serve a purpose
18:22
So just alert when the purpose is not accomplished and maybe everything else look fine, but if this doesn't Maybe there is a problem. It's worth looking into it last but not least write runbooks, how is this system supposed to work and initially how you thought about it and
18:40
Then every time some things that you didn't think about actually happens because reality is reality And it's there so you can remember how you solve the problems and how and to end all failures easily Especially when it breaks at 4 in the morning so Now just if it was a bit boring and you fell asleep
19:01
I'm really sorry, but if you just need to remember three things, it would be this design for change It's a very the main difference when you do systems in production The thing as a code is probably could be into outlive you in the company sometimes think just Change slightly the mindset about it. And as I was saying and machine learning code is code. There were
19:23
30 good years of best practices in writing software engineering and Use it. It's not different. It seems different, but it's not all the good ideas are mostly there. The last part is Verifying any assumption you make because again things change and evolve and so you need to be sure that any assumption you've made
19:44
are still verified with your systems We're hiring if any of this was interesting for you and you want to join us at Yelp We have offices in Hamburg London, San Francisco. We had a stand With this end of advertisement. I thank you all very much and I will be taking any questions
20:12
Thank you for this great talk. Where are questions? Let's begin here
20:20
Thank you for a talk so you say it was not to talk about tooling but you still have any advice because As well some part of software engineering still apply to data science but we've got some specific problems like like that, are you need testing or input validations and Also, they are showing everything you can just put all your data and get so that's also a problem
20:45
My advice is we use s3 quite extensively because storage is actually really really cheap and just throwing everything at s3 and figuring out later is Actually working It's on that. It sounds really bad, but it works
21:02
But as you keep like this model was logged on this data that you have like just keep IDs Everywhere or do you have any way of I'm not sure understand the question when you you get this model and was learned on this extract of data and I would you keep track of that. That's this model produce this result from this data and this
21:25
Right now we have a semantic versioning So every time we change something to a model which is in production We bump this version all the models when they are written are written as a file name It contains the version that was generated with and sometimes a git commit
21:41
That of the version that was running when this was done and most of the time Combining the two you can actually know what happened and just reset your repo to an earlier version To if you need to reload it or remember what was done Thank you very much
22:05
Thanks for the talk any comments about regression testing after you deploy models So this is something which I have faced in teams that I've worked for Because regression testing is a part of software development and machine learning deployments are kind of non-traditional because when models change Outputs of non inputs change and then QA analysts go
22:26
I don't really have a great answer, but I would say If I can go back here if you look regularly at the metrics You're trying to move and what your model is supposed to do You will know when there is a regression because the numbers you're trying to grow are not going to move in the directions
22:45
They were they used to so just I don't It's not like regression in the same sense as the code is more you look at regression and the patterns in the things This is I would admit this is more problematic than with code because it can be all the reasons like hey today is actually
23:02
Also metrics are going down. What's happening? It's horrible. Oh, yeah. Oh today's a holiday. So actually everything's fine. So oh
23:28
well, so Here what we're doing right now is we're not never deploying a new model immediately We are doing progressive releasing. So you have the current model version, which is big We release the new version as for a small part of the traffic and we look everything is fine before actually switching it. So
23:48
That's how we solved it I Yeah, it works
24:05
Thank you for the interesting talk one comment and a question on the measuring things There's a great blog entry by uber that talks about how they do it with actually machine learning the metrics On a question on the log everything kind of approach that goes very much
24:22
Against the way for example Jupiter notebooks makes us, right? How do you square that? And what's your advice on that? Yeah, I might have gone a bit about it I'm not really a fan of there are a bunch of things that allow you to Productionized you put a notebook directly
24:41
things are several problem with that which goes with code quality because structure in Jupyter notebook is very linear and code isn't It's hot right test in there. I would be very interested in seeing tests actual real nice tests that are written in Jupyter notebook Maybe it exists. I've not seen it. So it's complete. It's completely different
25:04
Beast in a certain sense You're going to check in your code real and it's going to become not an experimental thing anymore It's going to be a git repo that runs is deployed. I Maybe there is a way to bridge that gap at the same time I'm not sure It's a good idea like Jupyter notebook is really really good at doing what it does and checking a code
25:24
So it doesn't change too much It's not that easy to change and you can track all of the change and do everything is what you need for production Does that answer the question So we use Jupyter notebooks for everything that happens before we put it in production and then you start copy pasting the code from
25:44
Jupyter notebook into an actual repository you start writing tests you start Going like going back to data scientists like so you did these two things here like the two intervals are not exactly the same you ask questions to try to improve and you start doing code reviews and
26:01
You make sure everything works in a regular fashion, so it's really two separate workflow. Thank you and The top was brilliant, and I think great advice for People who are developing the models do you have any advice for?
26:21
application software engineers who need to Integrate model into a product or a feature or perhaps advice around collaboration between a data scientist coming up with a model And a software engineer implementing it Yeah, that's so I have some advice in a way and it's
26:42
It's more organizational and what I think is important is everyone kind of is on the same page That the data scientists are using roughly the same tools are going to use in production and not have their own little word Which works, but actually never never you're never able to translate it and the other part is once you
27:03
If you you're a data scientist you spend four months working on the model And you have an engineer like that would be me that comes and kind of take it and does a lot of things with it And putting in production I think it's very important to keep that person in the loop to have because you have a sentiment of ownership with this and this is
27:21
What we're doing so keeping people in the loop and being sure that things stay within the things That was a very vague answer Okay two more questions, I think
27:41
Thanks, you mentioned s3. Do you also use like for example? S3 to trigger let's say the modules are in the lambda or something serverless for that or so I Don't I know some people are playing with it. I have no experience in the subject Thanks one last question if anyone know okay then
28:06
So it may be a little bit off topic because you said you won't be talking about tools itself But can you guide us what to look for and just pop out some names to check out?
28:24
Yes, I have a particular liking liking for spark to be honest because it handles data set of All-size is very easy to run locally. You can write unique tests with it and just pin Pretend that your class your test instance is gigantic and actually it's just I'm using one CPU and everything works ish
28:42
so Yeah, I have had very good lots of algorithm models easy boost are compatible with spark Trying I would always say kind of have to two guns one which is a good old tool which is well tested and this spark is kind of old and not trendy anymore, which means it's stable and actually works in production and
29:04
Have like try out the new things to see if you can leverage them and what are the problems with them? Thank you So thank you very much again