How serves Deep Learning model predictions

Video thumbnail (Frame 0) Video thumbnail (Frame 3299) Video thumbnail (Frame 4328) Video thumbnail (Frame 7497) Video thumbnail (Frame 8868) Video thumbnail (Frame 10443) Video thumbnail (Frame 11437) Video thumbnail (Frame 13396) Video thumbnail (Frame 14711) Video thumbnail (Frame 17300) Video thumbnail (Frame 22149) Video thumbnail (Frame 25714) Video thumbnail (Frame 34074) Video thumbnail (Frame 36402) Video thumbnail (Frame 37213) Video thumbnail (Frame 44710) Video thumbnail (Frame 46165)
Video in TIB AV-Portal: How serves Deep Learning model predictions

Formal Metadata

How serves Deep Learning model predictions
Title of Series
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Release Date

Content Metadata

Subject Area
How serves Deep Learning model predictions [EuroPython 2017 - Talk - 2017-07-13 - Anfiteatro 1] [Rimini, Italy] With so many machine learning frameworks and libraries available, writing a model isn’t a bottleneck anymore while putting your models in production is still a challenge. In this talk, you will learn how we deploy the python deep learning models in production at Topics will include: Deep Learning model training in Docker containers Automated retraining of models Deployment of models using Kubernetes Serving model predictions in containerized environment Optimising serving predictions for latency and throughpu
Point (geometry) Intel Game controller Open source View (database) Virtual machine Mathematical model Mathematical model Product (business) Expected value Machine learning Software Predictability Scale (map) Matching (graph theory) Mathematical model Weight Projective plane Planning Bit Cartesian coordinate system Exploit (computer security) Hand fan Prediction Video game Cycle (graph theory) Library (computing)
Scale (map) Category of being Scaling (geometry) Coma Berenices Cartesian coordinate system
Point (geometry) Building Context awareness View (database) View (database) System administrator Computer-generated imagery Coma Berenices Mathematical model Mathematical model Rule of inference Area Digital photography Type theory Medical imaging Machine learning Natural number Different (Kate Ryan album) Hierarchy Social class Object (grammar) Social class
Classical physics Context awareness Information Computer-generated imagery Set (mathematics) Cartesian coordinate system Formal language Medical imaging Category of being Digital photography Uniform resource locator Spring (hydrology) String (computer science) Order (biology) Statement (computer science) Physical system
Laptop Dataflow Functional (mathematics) Mathematical model Video game Planning Set (mathematics) Ripping Cartesian coordinate system Mereology Mathematical model Mathematical model Product (business) Number Wave packet 10 (number) Different (Kate Ryan album) Einbettung <Mathematik> Video game Website Software testing Cycle (graph theory) Resultant Computer architecture
Laptop Process (computing) Personal digital assistant Core dump Order (biology) Software testing Limit (category theory) Mathematical model Laptop Number Mathematical model
Point (geometry) Dataflow Mobile app Group action Structural load Weight 1 (number) Virtual machine Parameter (computer programming) Mathematical model Mathematical model Wave packet Product (business) Number Web 2.0 Uniform resource locator Bit rate Read-only memory Different (Kate Ryan album) IRIS-T Semiconductor memory String (computer science) Videoconferencing Nichtlineares Gleichungssystem output Scripting language Predictability Dependent and independent variables Scaling (geometry) Weight Independence (probability theory) Line (geometry) Cartesian coordinate system Mathematical model Data management Loop (music) Integrated development environment Software Prediction output Video game Musical ensemble Whiteboard Game theory Sanitary sewer
Logical constant Scheduling (computing) Structural load INTEGRAL Multiplication sign View (database) Execution unit Source code Set (mathematics) Client (computing) Mathematical model Neuroinformatik Inference Semiconductor memory Extension (kinesiology) Geometric quantization Predictability Moment (mathematics) Keyboard shortcut Parallel port Sound effect Maxima and minima Instance (computer science) Variable (mathematics) Mathematical model Type theory Latent heat Arithmetic mean Prediction Order (biology) Inference output Summierbarkeit Quicksort Resultant Reduction of order Point (geometry) Overhead (computing) Observational study Control flow Mathematical model Number Product (business) 4 (number) Computing platform Condition number Overhead (computing) Dependent and independent variables Scaling (geometry) Graph (mathematics) Key (cryptography) Cellular automaton Computer network Limit (category theory) Cartesian coordinate system Word Geometric quantization Personal digital assistant Network topology Game theory Table (information) Freezing
Dependent and independent variables Process (computing) Integrated development environment Computer configuration Interface (computing) Real-time operating system Cartesian coordinate system Mathematical model Mathematical model Wave packet Mathematical model
Linker (computing) Link (knot theory) Software developer Hypermedia Software Touch typing Website Rule of inference Mathematical model Reduction of order Physical system
Dataflow Connectivity (graph theory) Virtual machine Set (mathematics) Mereology Disk read-and-write head Mathematical model Mathematical model Graph coloring Rule of inference Scalability Number Wave packet Medical imaging Semiconductor memory Cuboid Damping Software testing Area Email Algorithm Scaling (geometry) Inheritance (object-oriented programming) Structural load Bit Cartesian coordinate system Entire function Word Befehlsprozessor Process (computing) Pointer (computer programming) Numeral (linguistics) output Summierbarkeit Metric system
Revision control Predictability Hash function Information Personal digital assistant Latin square Table (information) Mathematical model
Roundness (object) Table (information)
my know is going up and
so I will be talking about how we so deep learning model predictions and booking . com and before I start I would like to give a brief introduction about myself but so I would like to tell
you what I am and what I'm not so that we have a better understanding of each other's to meet the expectations so I back in local working on developing the infrastructure for deploying the deepening mordanted to knock on and I'm also machine learning and assess so both of these things sister and match match well for me and I'm also a big of open source fan and I'm going to retain a couple of projects like this that probably most of you have were used already and I'm going to ponders library as well as control by Mozilla and will be the bullet play pool will and went about it and I'm also text go the so so the that we got what I'm not so that we have the expectations of the same that will and not others think it's and I'm not a machine learning exploits so if you have some specially questions about how things look from the thing this point of view and really about something and it'd be planning on machining I might not have this uh uh of the senses like now but I will be able to point you to to read you can find Alonso in doc about that after after my talk so let me start with the agenda like I'm going to talk about I'm going to start with the mentioning a couple of the applications of deep learning that he saw had not and then I will talk about the life cycle of for the planning model from another thing this point of view of the lake how this out model looks like and what are the different stages of of of the planning models Annex I would talk about the deep learning production pipeline that we have that we have bit on the goal of containers and Q 1 at this and yet that's been
so starting with the applications of deep learning and working . com the 1st application that it's all ad booking . com so before I got with application of electric arc with the
scale because I mentioned like this we will get the huge loss of as a large scale we have or 1 . 2 million now tonights is of every 24 hours and these properties emissions come from more than 1 . 3 million properties which are in across 220 countries so we have this log scale and this is for whites as access to a huge amount of data that we can utilize to improve the customer experience of by users so the 1st application
aerosol admin . com was a mistake in the Gaussian here is what do we see in a particular image like for example if we see this image what do we see in this image and this is a really easy question as this is a difficult 1 so if you ask this question to a oppose and a new it's easy because we know when you look at the again identify the images of the object in the image and this is easy for a human but when we talk about this Gaussian being answered by artificial intelligence by machine learning or deep learning it's not a very different easy 1 so for example if we also this image to to some publicly available model like this nettlesome something else this is what we get we get is sly to the different classes ocean oceanfront nature of building penthouse apartment and all the stuff but let me ask this question that what is there in that image it really depends what context here come about from walking point of view this is what we have concerned about we are concerned about whether there a view from the room or not but it is this is for the use of a bad or not if it's of Ford of into an inside the room or not all that is about in your status so did a couple of sentences associated with this type of problems 1st of all of 1st of all did this problem is not just an image classification its image tagging that means that there will be multiple that multiple to labels and classes for which the image and also sees what context is different from what other publicly available models neighbor wide we need to make sure that we come up with the on manually labeled so that we can tag these images and the next challenge is this is going to be hierarchy of the labels for example you receive off of that it may be all 4 so the know that if a if there is a bad in the foregoing the portal will be of of into inside you offer room on lets you in such a way that there is
no rule there's no room but is the only bad but so yup once we know that
what is that in an image we can use this information to improved expedience also uses for example if we know that the user is looking for swing was in the property that they went to book we can show them that I had a comment or show them the hotel which we know that it is a string puller there's some for the which is tagged with the importance was similarly if we if we know that that some customers based on previous history that is cumbersome that is looking for it because of it we can show that would have a dance of properties which we know that I have some photos tags because of it so this can make sure that we have in the improving the because of the cost more than helping them I find that would as the properties that they want to easily and quickly
but another a application that is all and that was to become magician system so this is a classic of the recommendation problem you really use of X the Book of the user what the what and why now we have a new user uses that we want to recommend some orders that the users that is more is more probable to book so the problem statement had is we wanna find probability of 1 user looking upward again and what features we have we have some user features return the country and language of the user and then we have some contextual features like what's of the of the beaten down looking for it on the what what's this season that they're looking for it this winter spring our our what's there was a seasons and the next so feature that sets a set of features we had as item features the features of the property that we are looking at like the price of the of the property of the location of the property of other information about what the produce property so once we realize
that there are some set of applications where we could use to achieve better results using deep learning and we start slowing this and that's in place for my believes that starts the thing and and raft was that a scientists uh who actually started out of it that will explore exploration of the planning on different applications and now we at the using in production are successfully so next this talk about the life cycle of a model of what it looks like what a particular model from the start of the idea to then it actually is used in the application and new new application which will be anything
for the you'd see steps train and apply In plus so what we do know is that this is a set of then out the thing to sites a model and the experiment with the different kind of embeddings different kind of features are are different number of flow and hidden layers of any any kind of uh that that that was the test on experiment the different kind of model architecture and once they are happy with it once they see good desserts they move moved to what's the training on now on production that I intended like now at booking we use tens of Python API to the highly paid whites that easy to is use of functions to try to model architecture for easily by the so many got on the production pipeline these are the 2 steps that we have in the past production back into the college training of a model on production on the deployment in the in containers which can be solved by any application so you may wonder why a training of a model is a part of why production pipeline you may also use your laptops the cranium models right but this this is why it it is not a good idea if you try to train your
model on your laptop this is what
you may end up looking like the couple of reasons for that 1 reason is you've got maybe too large that you can't use your laptop efficiently on the other reason is that your laptop won't have that uh Europe probably most of the cases that have limited resources will have some limited number of cores of may not have a very powerful GPU so that these are the reasons that why you may want to do the testing and experimenting with a model on a laptop but then they want to show that this is only audible had it's a good idea to that to you some heavy sellers order to specialize so was a GP use of it to have a higher number of course so that you can speed up the process and the and and speed up the process of deployment when you actually get the more ready so this is all the
training of a model looks like we use as soloist like we have huge sewers which have led to a lot of course and this is something to this is that we that the training set so this is the thing script for upward a model and we don't that allow argued so was it a production so but there are going to be monitored without a scientist who are going to train the models and sometimes they would to be multiple models are being trained in the same on the same so automata with over the same bank and we may not be able to provide a lot of In the band alignment if we do this uh in this honest single so or so what we do is we wrap this training inside a container so what is it and then a container is like a packet of software which you can run on on on on a host machine and it includes all the dependencies that you that your application may need so we have this training script inside the container respond everything you want a train of model and also destroyed the city easy to easy pushing of visible because the ones we have moderate intensive elliptic 1 . 1 equation now the new 1 comes up and a new that's I just want to use a new model we can easily have put that what new container have you wouldn't and use it so busy on the same machine we having different motion of the dependencies and it's that's why using containers to make sure that we have is independent of the environment for all of the strings and also has the dubious board we can also are these uh containers can also utilize the you board on our big so was that we had so this is sort of like
that we have this hoops storage very high won the production number that we want to use for training I want respond up when you went dinner when you want to train it has a 10 step and it fits the data from the loop storage it from the training was arguing is that we want to make sure that the modern Czech checkpoints the moral weights are stored some there so that we can utilize them metering production and we divide them so what you do is the same the moral checkpoints that would storage and the is gone and yet so it's that what what can be more self less than a container it takes about to do what you want it to do and it dies that's the entire life of a dinner so once we have this training done we have train our model on production that out and we have stored their knowledge at points on her do storage which again utilize now not deployment is putting that modeling production some and so on some you can utilize that model to have predictions by different application that you may have you may have your web application are you may have you ever happened right iris and you have and you want to make sure that you can utilize those so that that models from those applications so what we did was we have a and that which is basically the doubly SEA as you so where you can be which what it does is the it takes the moral rates from the convener of from the other groups it and it lost all in memory so when we want to low-order model at to these 2 things it needs a more definition as the as the so we have long definition already when we have despite the at training and we get the it's from the looks it become these in the dual the model in memory so that it is ready to so the predictions now and they don't on the top of this also provides a nice here as a nice easy to use ladies remember your your to get the predictions so busy on was down to sending a GET request video that all you have the parameters that you have and getting the prediction that this is sort of like a game we have flow so these apps running in the conveners environment so that it's independent and it carries all the dependencies with itself and there's no problems like if friends of my machine or tons of disillusion of West that origin so it contains all the dependencies that it needs and it can dominates so work that you can run them dopamine is visibly reuse topical manager to use a so this is our looks like we have a containerized to serving of of on model and we can have any kind of lines which will just say the input features and get a good predictions but as I mentioned earlier we have 1 huge scale that we operate on and many have thousands of requests are millions of requests for a 2nd begun this have the ones that was what they do is this
response a lot of containers would them in behind the lower Benson and the client doesn't know how many so was actually selling you just send a quest to lower beside the and orders of extent of the of all the scheduling although the dumping the quest since we have a future here he uses a large scale we have plenty of water containers so once we keep on using the number of containers that we have for 1 application we want to read to be able to many of these containers because it's possible that some things you want to increase the number of containers on things you want to decrease the number of containers and easier business traffic on uh even want to die because some of can be innocent some something was wrong with the containers or let's say the word is obligated and explore them again because it's a matter of something so for for this so we use that he wanted his they argue this is a condition orchestration now platform which helps us in scheduling maintaining and scaling applications using banners so the human it this is a really nice but to buy with which provides us a really nice flexible to scale-up was killed on any of the application at any time we can create new instances new containers for them behind the 2nd order because and those containers will be also and applications the the question the client or because the alone is just 1 command and also the what is make sure that if we mention that we had 1 to have 52 to for example 50 containers for application it to make sure that if 1 of the being of the cool opinions off let's attended is because some of it make sure that at any moment it's going to be fine and create new ones so that we don't have to care about if something goes wrong on this there is something seriously wrong and it kind of create new being so basically to try to maintain the number of containers to a particular the limit that we have set so once we know that how we can how we how we deploy the orders we also need to be able to measure the performance of these morals in production and we have a lot of requests coming coming in at the rate of of that and the off thousands of requests per 2nd so this is sort of like let's say if you
have your model and it takes some competition time to compute the prediction for a set of features for a set of input features and but that's something to the kind that your client is going to see your client is going to also have some request overhead because of networking you can see depending on where you're at at listed and then apply this coming from so this is what it's like the prediction time is the sum of request had and then compute competition time and if you have any sources you just so that if you have an instance you predict on in 1 the quest you just might have led by Commission by men this is what you get as a as a federation of your prediction buying from planted blank point of view and we can see that if we have some simple models very competition time is that like simple more than all acyclic national integration very have was a small set of features and it's a small model there we have no that the quest ordered will be that collect and the competition I will will be almost invisible compared to the question of so once we know this is the kind of performance that we can expect it maybe 2 things either you may want to optimize for latency Ottawa let's talk about latency this is the amount of time it takes to solve 1 request so how can we possibly you may have some applications like that so you have of that that mediation which needs to be solved as soon as possible so you want to optimize for latency there and these are some of the ways that you can use use to optimize for latency was a store predict in their time if you can pre-compute this is a simple way when you can that be compared or others so it's that you know that I want to be there to predict you can to save them in the lookup table and the cell from the table and you you would be really fast you won't have any any competition binding that in time but in the sense that it's not always possible in more almost almost and Mosul applications we have 2 of the need to predict real what we can do there of we could reduce the request overhead and 1 of the ways we could do that is we can have the moral embedded and applications so that is noted and seeing accessing the mortal and getting the predictions that that's what we do is that we keep the model in the in memory in the container that is something that and so that it's able to predict the and return the request from the next is pretty for 1 instance this is useful when you have a competition time which is a huge as compared to the request overhead when you know that your Commission there is the key is the is a bottleneck for your request you can you should you send as many requests as instances you have Felicio break for 10 set of instances we should send them because you know that you're request overhead is not that is is not the word like here and you don't want to reduce the regressor where just want to make sure that you send a quest as well as possible and get the results back and you know also but do some techniques like quantization and other means is it means you can what you 0 to 22 values to fix type it picks out and Howard Head's is that is now you see if you can would you for paying more during the same process of and hence side becomes fostering processing that that are uh in computing the fruit values as compared to computing the floor values and there are some fanciful especially techniques like freezing the natural tree that means that some when you when you have someone about competition graph what you do is you have some variables sense of variables and you can work those variables into encipherer constant you get some of those to the performance of the speed of of the competition of prediction and another thing is you can optimize for inference of the means is you can now add remove all the unused nodes from the graph and that will have kind of been boasting of their competition a game next is we may want to optimize for for to put 2 would means the amount of work being done in 1 unit time maybe 1 2nd 1 minute depending on what you use cases if you want to get a lot of work done per unit time it's again the 1st step 1st thing on this is your recompute if you cannot have a look up table with all the combinations and use them have because comes and it doesn't even do is that is the question when you know that you want to maximum of done will done in the unit of time you want to reduce the request all as as much as possible so if you send a lot of it was together in 1 because thousands of requests this you're going to get performance boost of those thousand times equate request overhead if you don't have knowledge as compared to when you would have to send us a single request 1 by 1 and you would also send better laser quest and you can just use all synchronous request instead of using the study of instead of waiting for 1 request response to come back to that of the 4th Decennial aquatic to send them all in parallel and then the so as to it's all condos across effect sponsors and make sure that you get a maximum of of abandoned you back
so let's try to summarize what we've got about but 1st of all you got
about training of morals in containers respond you would be no it such that doctor from our Hadoop storage that can be mastered as it depends really depends on the application and it runs offensive in Eindhoven and environment in a container MIT once the training is is complete make sure that it stores the dada stores the more checkpoints back in the hoop storage and dies that's saying that process of the training of a model in being of the next is selling these models from containers using Cuban is response has lingered in his as possible and as we need depending on how many requests we have for that with the application and we let the Cuban at the steward stuff with the lord missing as well as the maintaining and managing the containers and providing a the interface to to diagnose so all the problems that we may have and the next is we optimize uh something awful at used for latency October depending on what application is if you have a a con job was something which has a lot of people do in 1 burst you can use uh that techniques to optimize for it to work on if you have a real time application of it in which you just need to show that is the right way to the to the user you can optimize your so Florida latency we have all the options available I knew I play on to what 1 all these
things and a lot of other things like MapReduce Spark McManus system and a bunch of other things we are hiding and hiding specially for often of the rules as well as the status and distilled say i if you want to if you if you want to make it into a city on looking on this that these things you may check out this link all you may get in touch with me funding being
traded on could have and I go by size model to define name 1 most of the social media websites that said thank
you if you thank a set of these raise your hand if you have a question but the Oh and thank you so use component the Senate can scale up and spin down number of replica says as you mentioned in the so what do you use to decide the issue of scalability because or scale on like what algorithm with is behind the load bones do not manually or automatically I didn't say question so what we use as a metric to decide whether they want we have exactly like you have a number of replicas like I know you have float and you want to decide if they should be the parent because of skin yet so human it is out of the box provides a support tool to a to a few metrics like CPU usage this memory as well as the traffic that we get like in the number of requests to it really depends on the kind of application that you want because in some of the areas we you want to but had the metrics if you you you state which gathers how BCI a CPU's on the part of India on e-mail so you want to use the WSJ to size because once we have a lot of requests coming to spending as we want to make sure that those that do not use are not fully and once those the user getting food we want to explore more and in so that that traffic maybe that can be distributed so that the those cues are not dropping of the request so trade events out of this yet but you sizes of the methods that have been looking at OK thanks and my 2nd question is how do you annotate your data for a model smoldering there's something more off update there's solar ordered uh question is how do we acknowledges that our get how do on the data like images if you have some theme of annotators who so this is but this is so chaired this is new 0 yeah OK so so when we started writing this moral going Mr. tagging we hired some will be outsourced this tagging mainly we had some of our user number of images that mainly by people by humans and use that kind of that that Eritrea model and its and company or how did you 100 them it's somewhat that the sooner the company that that that has the sum of the sword how you've made it was not clear so I will come into the and the the yeah the OK so you're much retorted thing this is 1 of the main problems with Python machine learning deploying it but I would be interested if you wanna use machine learning usually have to do some feature engineering like you get some input data and then you have to crunch some numbers from word you actually do that do you do that in the AP and you tell the of OK you have to provide the stator or do you do that in the container or do you do it in on the head do when the data comes in and you just kind of like send a pointer to the to the data ah yes so that that's something that in color so what we do is we have some like some kind of even start that is being logged in uh in all the activities that we have 1 of the plate and once we have the guy that we have some rules evil flows upon jobs which they did that I'm prepared that that of 2 to the kind of that that we won't be using the models so we have suffered workflow Ritzer takes care of that imaging and preparation of that is the for these models but but was wondering I was wondering if you could say uh talk a little bit about how you iterates with models and let's say a much of what is does the case but if you have some new training data you want to take it into Council to retrain the models and then check what it does to performing well in notes but how do you deal with this kind of things and so I just think about how we deploy new models of the performance testing of numerous days OK so once we know that we have a new model you want that offenders model at the model what we do is we have some particular so we use open triplet of of few when it is to manage you there is nothing in the face of the entire sector US 1 single that is a new model we have to be the update our deployment with that model and we can use it we to to to you see how what kind
of desserts we get that we have proper monitoring which tells us how was the distribution of
of of features and so this version of our
courts forward tomorrow and we can use that information to decide whether the model is good or not and we want to keep it was moved to the pure solution Gunn Inger in your talk that you know what the ways to improve throughput and Latin latency was to cash to have hash table of press previous predictions how would you implement that per container the you have a centralized in what technology to use for that so on so the thing that I mentioned for the for caching and keeping the predictions in the in the lookup tables that is something that it really depends on you use gives us a honestly we haven't got a that kind of use cases that we already know what kind of of predictions even to predict
so what you do is we don't use lookup tables we predicted that it aims to use other kind of techniques that I mentioned to optimize validity as as were so we don't all already have some of the issues that would imply the lookup tables the you In so that's you know thank you so you know please give a warm round of applause to send your