Logo TIB AV-Portal Logo TIB AV-Portal

Is it Food? An Introduction to Machine Learning

Video in TIB AV-Portal: Is it Food? An Introduction to Machine Learning

Formal Metadata

Is it Food? An Introduction to Machine Learning
Title of Series
Part Number
Number of Parts
CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Release Date

Content Metadata

Subject Area
Machine Learning is no longer just an academic study. Tools like Tensorflow have opened new doorways in the world of application development. Learn about the current tools available and how easy it is to integrate them into your rails application. We'll start by looking at a real-world example currently being used in the wild and then delve into creating a sample application that utilizes machine learning.
Computer animation
Actions Computer animation Lecture/Conference maximal bits number
alternatives Computer animation Development God
Chat machine learning Sequel information regression NET graded machine level Databases gradient descent
point functionality focus real graded maximal frame Types image photos Computer animation case orders iterations optimal Results gradient descent
photos processes Part
Computer animation ones Right contrast Cats
Computer animation website pattern
Actions factor survive Computing Part programs information image machine learning different cores automate form tasks recognition information key Development unsupervised learning approximate Types words processes Computer animation fuzzing record
area category image supervised learning Computer animation information neural network time Clusters unsupervised learning systems exception
image Types flow recognition specific algorithm causal perceptron neural network machine bits
flow building Graph information neural network sheaf sets shape dimension training tensor versions image Computer animation Right Results systems
image category Computer animation projects machine website sets training
image Types Computer animation Content website pattern number
image Computer animation states machine website applications number
flow Actions states time machine system call programs number compiler tensor image Computer animation compilers
point current flow non-existence Graph NET sheaf sets transfer Privacy Part tensor image category mathematics processes machine learning Computer animation case Video
script flow pixel information files time experts machine functions directories programs training tensor image Computer animation repository testing exception
flow complex files time loss functions Part programs training image mixture terms entropy testing scratch model systems Graph validation neural network formating NET moment binaries directories tensor processes Computer animation orders metrics Results
point Graph files code neural network states directories functions lines Part number training Twitter tensor category image processes Computer animation orders testing Results
image Computer animation orders ones box testing
point predictive server response services files code machine applications several Synthesis training web category image Computer animation PIT orders flag Routing systems
Types processes Computer animation files applications reading number
point flow statistics subscriptions high resolution time machine sets font Part energy vision training number production image programme specific Dogs machine learning different memory single Hardware flag Representation testing extent errors systems form predictive script addition Graph scale NET Development bits directories potential category Types processes case website Right family
the and human and policy great sorting when they show I was working so then how was everyone's rails Conf going that the so I'm a start with whom I
I go by this spam also Matthew Mundo or halogen and toast online you can find me on pretty much anywhere under that name the the usefulness of it known out is it you may be wondering where this comes from it actually comes from my
last name on Joe they may not be able to spell a prophesy and because of that you might call me mongers and edges
and appealing shortened to groups now
everyone frequently reminds me that this comes from the movie Top Gun but
I haven't seen Top Gun and
there's a very dark truth that I found in Top Gun and it led me to do a little bit of investigation and I found there are a number of movies that feature characters named goes so the 1st 1 is mad max
and the Dusan that move the dies in horrible fire through the city of God as a
character named used to get shot and dies and there is the movie Top
Gun and I assume that he dies in some kind of automobile accident and the alternative that being called use was that people once try to shorten my name to mongo and that was all
fine and dandy but when you're in developer
chats it becomes kind of confusing and people don't always necessarily say the best things about mongo so you run into this problem we elected not sure if they're talking about you or the database and you don't want to have an existential crisis with the database and a sequel database because it makes it really hard to build relationships who thinks that you know everything you need to know about me now answering it started with is it food journey of amines machines and meals and when I give talks I like to think about why do I wanna give this particular talk and the reason why I wanted to give this particular talk was I want to learn about machine learning and I watched a bunch of talks and other things on it and I always felt like the information was either way too high level or weights low-level forming and I couldn't find a happy medium that would make me happy so 1 of the problems that I often had was I watch the stocks and I the things like
this Our charts talk about things like this and reference linear regression or gradient descent or other things like that the and I try to look up definitions for things like gradient descent and I get this gradient
descent is a 1st order iterative optimization algorithm to find a local minimum of a function using gradient descent 1 takes steps proportional to the negative of the gradient or the approximate gradient of the function at the current point the that is that is how I feel when I read these types of things so I wanted to set up some goal so that my talk wouldn't the this
and I have 2 main goals of this talk I wanna focus on practicality I wanna talk about a real world use case and frame everything that I'm talking about kind of around that idea but 1st start with some disclaimers I'm going to try my best to balance between practical and technical and as a result I might end up over-simplifying something this is really just meant to be a high-level overview and my goal with this is to have a few of you at the end of the scope out that was me maybe I'll try to so I was after the brief experiments I wanna find out is something food or more specifically as people like to remind me that images can be food is at a photograph of food so I hope this will be a fruitful
experiment and is this the so as I is a photograph of food the and how we know and of the exactly we've we've seen things that look like this or exactly this kind of thing and we know its food the but is really into the mean of
our problem so some of you have met may never have seen this particular dish before yakitori but how do we know this is good it it smells good OK the so it's a part of it is that we can immediately recognized the ingredients and break it down and kind of process and it looks like something that we've seen before as pretty
sweet but what about this Is this good the the this one's really questionable but if I if I give you the contrast year is this food and I say
is this food right you might say that this 1 isn't food but this 1 might be because we
can tell that there's something inside of there and we actually have a lot of experiences where we've opened up this package and even delicious per-speaker cats and then very happy and it's really awesome if you have a child a package of anti-gay take cats and they're not use
this experience yet they very very disappointed because there's nothing and at the site of a couple more is this the it this is a picture of
food I'm not sure and I I I think we can clearly identified this as food but what about
this I really hope nobody said
yes the but I really want talk about is were able to look at these pictures and identify the food because we know about pattern
recognition and in many ways this was core to our survival plausibility pattern recognition that it's 1 the key factors that led us to developing language to being able to develop tools into agriculture and were really exceptional added except we hate slow and menial tasks if I gave you 10 thousand images and I said Are these all food you would get bored really quickly and not want to do that so that makes us trying to venture out and automate the problem now this is an interesting kind of problem to try automate because as humans we can recall information in extraordinarily fuzzy way and between past experiences we can piece together the information we have before word form new information understanding about the world around us but this process is very hard to program and so machine learning can be used as an approximation of this kind of behavior so I wanna talk about 3 different types of machine learning in this situation we have unsupervised learning which specifically refers to taking information and kind of clustering in grouping together into individual parts and the computer decide what that necessarily means so you might have a
chart that has all these things pointed out and here you can see the the centroids of they are this is the blue group this is the green group and so the record and doesn't really matter where adult lies is it just looks to which centroid is closest to the anything in the
green area would be green anything in the red area we read anything in the blue area blue and this is the kind of an unsupervised learning it just kind of clusters everything together and individual buckets and that's kind of useful but sometimes we wanna now can we label something and
so that's where supervised learning comes in in supervised learning you provide all the information up front and say these things this these things are this and these things are this and then the system tries to learn how to identify new information 1 and the last of these 3 categories as reinforcement reinforcement is kind of similar except at every step of the way you tell it whether it's right or wrong so tries to make a guess and you say yes you're right and it takes an information and applies it to the next iteration so that I can get better over time now for the purpose of identifying images were owing to be talking about supervised learning here and specifically really be talking about neural networks so I want to give you a brief history of artificial neural networks so 1 of the 1st artificial neural
networks created was the perceptron and was invented in 1957 by Frank Rosenblatt and this machine is actually designed for image recognition back in 1957 which I find extraordinary fascinating and but 1 the problems with this style of image recognition was that it couldn't really learn how to do specific types of algorithms and so it could only learned linearly separable patents and so
I thought about the definition now the problem with the research and the fact that it could progress toward was a cause neural networks to stagnate for quite a bit now what happens and because of that and now fast forward 2015 and talk about cancer flow
and that the brief history of artificial neural networks I so what is tensor flow so tensor flow was developed as a system of building and training neural networks which are represented as something called a data flow graph they can look something like this and each layer of this graph which is represented by 1 of those oval shapes that takes in a tensor and return the tensor performing some operation on the tensor what is a tensor so when I looked this up I got an image
that look like this and when I saw this I just assumed I would never be able to understand what a tensor was the simple answer is that a tensor is just an n-dimensional array of information you can choose whatever dimension however you wanna represent and just a set of information that goes on it and then you have another n-dimensional right it comes out as they can be whatever dimensions are necessary for solving the problem so in this situation you take in summary and it goes through each individual steps and sometimes it's without 2 more steps and converges onto a single step so something will come at the beginning work its way all the way through each of the steps and eventually you get some results on the outside now this particular data flow graph represents inception or specifically in Section version 3 so you might ask what is in Section so invention is a pre-built data flow graph useful for categorising images and it was originally built on the
ImageNet dataset now this is a pretty interesting website because you can use it to get human classified images for a categories and their tons of them available to you so it's really useful if you're trying to do any kind of machine learning to be able to download the set of images and use them in your projects for training purposes so that covers what is inception but
how does this all tie together and I wanna return to my original question is it food why am I trying to answer this question well as you may surmise from my I worked for a company
called that bad and we happen to have lots and lots and lots of data about food and particularly we have lots of images of food
but when you have on your website that allows users to some content you run into some problems they might not always be trying to submit food to your website and you kind of care about that because if you don't care about that
very bad things can happen and people can get upset really really quickly so there's a couple things that we wanted to we want protect our users by ensuring that the images posted are actually food for a number reasons we don't wanna show them something that's inappropriate we also found that users like to do things like but texts on the images and say 0 you can find a real recipe over on this other competing websites and we don't want those types of things to happen either so where I work at the pattern we essentially do that
we have an that basically looks at an image and classifies whether or not it's food and so I thought I'd be really interesting to try and recreate what was already created and over the rails at an army application is a Rails web site but the machine learning stuff is all done in pipeline so I decided I was going to build a
rail that the the the state number
1 actually in all fairness mistake number 1 was when I went to this problem I told myself no Python and not use Python and when
use really because I'm giving this talk at rails conf and no 1 wants to hear the dirty work Python so this the fine there's a jumper kind of handling this call tensor flow that are being
now I tried as a number of times lecture was able to get it running on my machine my guess is that this action she would climb and it's into compilers no 1 enjoys trying to debug the compiler on a machine you just kind of wanna download have at work so that it happened and so 1 of the suggestions was to use at the state number 2 was of course Dr. so after setting up the doctor image there were a couple programs written in the in the book text C + + for kind of solving the problem that I wanted to do and I try compiling those and those in work so I get retrying and retrying and retrying and I got my favorite Dr. problem which is your start this is almost full as after figuring out through the documentation had to delete all my images I decided to start over the and so I kind of wanna document my road to
success for solving this kind of problem with 1 small know your mileage may vary summing over installation and setup so this is my starting point I
decided to install python finally set up a virtual and and go to the process of installing tensor flow now I make a quick note here as aerobiosis I want to embrace the really community I want things to work in the Ruby community and here I I'm telling you to use Python but I wanna be clear here this is saying that we shouldn't as a Ruby community embraced machine learning and try to make it part of our community if you're getting started you do not wanna fight against a tools and right now the best tool set that I found is using Python the communities really strong in machine learning that put together the correct tooling I highly suggest not fighting against it I suggest using going with the flow and using the tools that work and so for me I got the set up and it just works being able to use Tensor flow now the next step after installing tend to flow through this process was to figure out how can I make it my particular problem there is already a tool inception which works on image net images and classify those but I wanna do something different I wanna do retraining I wanna change which image that is able to identify to an image that that I actually care about the and we can do this through something called transfer learning transfer learning is basically if we look at this data flow graph we have 1 step at the end and if we pull off that step we can replace it with our own steps and the benefit of this is we can actually use everything that was previously learned and apply it to our current set of data so if you want to work with in section V 3 inhabit recognize your own image set you basically just need to collect a bunch of data the NET collected into a folder you call everyone I call the data because I'm not really creative and inside the folder you decide what categories rewind have no which categories you choose is really crucial it's not enough just have a bunch of pictures of food if all you've ever seen in your entire existence is food then all you know is food so you want be able to identify things that are not so if you can think of cases that are particular to your situation that you want to identify against you should also have images for those in our situation we really care about flowers but those came with a example so my company use them for some reason I but we care about people we don't want people showing up in any of our images we want protect users privacy we wanna remove any pictures of humans and additionally we want to avoid tax for that problem that I mentioned before we had lots of users who want to say the watch the video for this recipe over here so we all want to happen so we created these categories and inside that we put lots and lots and lots of
images now for my training purposes the folders that I had had between 1 thousand to 2 thousand images except for text which has around 600 and the nice thing about this is the images can be really small tensor flow actually only operates on images that are about 299 pixels by 299 pixels so if you don't have an image that size to automatically resize this and this is another thing that you care about with your test data if you don't have your image properly resized and it tries to resize it if you're subject isn't in the center you might lose that information so can make sense to go ahead and resize image ahead of time to make sure that your subject is in their properly right so now that I've collected all my data my next step is to try and retrain and I'm no expert in machine learning so I don't wanna write the script myself luckily there's already a script that
does it so I pulled it down from the tensor flow repository and I change some directories so that instead of outputting the tempered output into my local directory a few files and then I decided to run the retrained program telling it that the image de was my data directory and I waited and waited and weighted and weighted the
and was waiting at output and things like this it said looking for images and all those directories and started building bottlenecks what is a bottleneck I do not have any clue so that this up and the definition for a bottleneck was an informal term referring to the output of the previous layer and the reason we care about this is because in order to train the network it's going to have to keep passing images through and seeing how well it's doing and you don't wanna have the image have to go all the way through you data flow graph every single time to actually it builds at once and catches it so that you can reuse it and if you actually do this from scratch you won't get this pretty printing out under bombing files created electrical file-by-file telling you it's created bottlenecks is it what happens after its cash to the sorry catch those values and combine them but it will describe this out as of do in a moment the next thing you'll see is this the so referring to things like train accuracy cross entropy and validation accuracy and this is kind of interesting to me I want to know why I was printing out these 3 things and was really need is tensor flow is going to do a very interesting thing where it splits out your data into 3 individual parts 1 for training 1 for validation and 1 for testing no training is the data that from your model on so it's going to say like out these look like these in monarchy making things better along the way and you try to make sure that through the process it never sees the testing data in order to validate so once it's finished this program what she run the testing data through and see all do we get these correct but doesn't use it as a method of training it only uses of validation and that's used to avoid overfitting we wanna make sure improvements in training accuracy actually appear In the unseen dataset so there isn't an improvement but I can see it doesn't actually care about those particular weights and the last part cross entropy is your loss metric and this is important every single time you wanna train on all network you need some kind of loss metric to say this is bad you want minimize that value so that you can make the system better and all of these values are focusing on the
accuracy but the results of this running are 2 different file think output the graph and labels now the graph is kind of complex it's an encoded format of the previous thing that I showed you that has each step and that's a really interesting open and look at because it's kind of a weird encoding a mixture of texts and binary but the labels file is actually really simple it looks like this
and the reason why is because we're talking about a tensor it's mostly dealing the numbers at the output you get some category belongs to 0 1 2 3 4 and you care about the actual label of that category so just outputs this file with those labels so that afterward they can look up that label on say are this was viewed as a person is a flower so let's get to the point we can use it now it once every training I want to build a state that train network and label my image and I just found some code line again in order to be able to do that so I just copied this label image that wine into my current directory I change the graph and labels to my local directory and a booking
something like this so this is actually fairly simple compared to the train which I won't go into 3 trends about a thousand lines but this is approximately 50 or so and all it does is it basically takes that I last layer that's the final result part on there and runs the image through just that last layers and that last layer is already been trained you everything that we need to do so this process is actually pretty quick and we get the final ready and here and you can see that it just looks through the labels file and find out which label the pull out so great I've done this step by guns to the point where I can potentially label images blessing collected that actually working Ch the alright so have a few test images in here so here we have a directory with the managers and isn't
I have sushi pizza which I suggest you never really the I have this post box
hamburger pizza and the
so let's find out which of those our food and which ones aren't so in order to do this I decide to run Python label image and then my image so let's start with the sushi pizza and the and so you can see here hopefully will figure that for food it said that it had 0 . 9 and as its yes that's really high so things so this is probably too this further I Ch and here we can see that it's in the category of other so great we don't want to show up as being so you can use this on all of our images and test
3 is the pizza and and of course it's not so sure that that's the said 84 % that 90 art for experience but it's still relatively high enough and that's a point to take into consideration is never going to be 100 % accurate you know really underbar synthesis food unless all you've ever trained in understood how it has 1 category there's always in the small percentage that it could be something else and you never wanna achieve perfect accuracy because you're probably overfit entity that solving the general problems that you want so was the back we we have this we can see that it's training correctly it's been trained correctly in can identify right so I got to the point where I wanna make this work with a web server I wanna be able to use this in my Rails application so how I do that I had code in Python and I wanna inside rails and so as I mentioned before I'm already doing this exact thing at the company where I work so my suggestion here is actually just to treat the machine learning aspect of it as a separate service and college Mureil's application so in order to build a service I decided to use something called flask which is kind of like Sinatra the users pit install operate flask and I just converted that label image that PY file into a small server that I could use to get the response that I want and I shouldn't take too
much this looks very similar to that label image PY file that I had before but here created surrounded and it only has 1 rout classify and in return the predictions for what things are now it's kind of important to return the predictions because you wanna look at not just the top prediction but all of your percentages to see if it's a high probability to be something that you're not looking for maybe it's worthwhile to label something that it thinks is 30 % a person or a 30 per cent and text if your system cares about that we basically red flag images that fall under those categories and then reviewed by an actual person the point if they need to be we found that we actually get about 90 per cent accuracy on all images that are uploaded
aaai so it's useless I built a small Rails application that is actually going to take the images and identify whether or not their food by calling out to the service of and I'm
serious she's a file and the sterilization pizza here I'll sizzling
harder on the top of the let's find other type of baby is we're not not food and yes good job the I readings rather than the number
of images but again I have the same predictions
that we had before so we if you wanna get started that's all it really took was downloading a few scripts offline collecting a bunch of images putting them together and running them in finding a way to make it work now when I 1st started this but again I said I was going to use Ruby not Python I wanna make sure that no 1 makes the same mistake as an entire day just messing our doctor and messing around the command line trying to find like our which flags can I use With crying OK it's not rocket crying uninterested thc system and make it work which GCC and going to the whole song and dance and after spending on that with the Python routed 5 minutes and so the lesson here is to make sure that you understand what you're trying to do and not fighting against it don't try to do it your own unique way if you're learning what you've learned it then you can go back and do it your own unique way and suffer if you'd like now In my process a suggested that the red some things and I suggest also reading these as additional material there's demystifying deep neural nets by Rosie Campbell I highly highly recommend this talk after my talk it's it's a really great it more in-depth into some of the more technical aspects of things and then if you're interested in the book I highly suggest Python machine learning by Sebastian and that's it if FIL is a probability of soccer every are we define as a the in it's other it might be a person 30 % it might be a person I don't actually know the answer to that question so what was I had to be the question the question was how many images a day is good bad classified and I matching not sure on the exact number so I can said I know that it's quite a few but that's so yes I can negate the question was when classifying text what is a kind of look like what kind of fonts answer reusing was an overlap and analyze the question 1st but I wanna make a point that uh as an international company we actually have our biggest market in Arabic-speaking countries and so we found that Arabic text was the majority of our problem because of that reason all of the most of the text in here is going to be error so we use a lot of examples of Arabic text with some modi and other things there were trying to identify you can see some of them on top of things that kind of look like recipes I some of it is English text not sure why it's about bookworms but if we have a lot in different examples in different fonts different backgrounds things like that I'm not necessary but they so the the question was about Google Cloud vision API have not use that yet so I'm I'm not on the machine learning team specifically so I messed with those tools I've only kind of gotten trickle-down education about machine learning and decided like I was interested in this I wanted to explore more so I'm I'm a rails developer not applied on developer and I kind of wanna stay as far away from Python as I can but I'm even more more interested in as an avenue of learning these things and much so the question was what would it take to build it uses the have answer so actually it's gonna take the community to take a look at and the flow of energy in trying to get at compiling on all systems the main developer for doesn't use OSX ceiling uses Linux and he's not necessarily writing it for that particular thing so people can get involved to have more knowledge about the kinds of flow itself and C + + coating and Ruby and making extensions for those types of things that would be extraordinary helpful in making this a reality in so the question was when classifying images and how long does it take for image how does it scale as you add more images into the system with the memory usage like and I don't have those particular statistics I found that categorizing images didn't take too long and even on our system a we have a lot more training data it still roughly short amount of time per image because as I mentioned directory only putting the image through the last layer of the dataflow graph the whole graph has been trained up to the point where that last layer gets to re use all that information and that process is fairly quick but I would still suggest doing asynchronously rather than synchronously like I did in my example OK so the question was as you're collecting images how do you know what's the right amount of images to collect and if you're accuracy isn't correct how you try and correct for and so the answer is there's no way of knowing what the correct number of images are because again it's it's an imperfect science right so you want like a lot of images but there's no like the cut off point where that number becomes the correct number and a more images you have are great but is not only does the more images you have of in my case food it's more images of things that aren't food and when you notice that things are working quite the way you want if you can figure out OK we wanted to identify it as this but it didn't getting more examples that are closer to that getting more training data that can represent step single form that you're looking for can help improve that OK so the question is what size they said do I think is the appropriate size for doing this kind of image classification production you know it it really comes down to a couple things I think the image size this that I have for justice testing worked well enough to use in production might my accuracy is prodding and below the lower it's 80 % but I can start it moving and then kind of fines is it working or not right if I'm if I'm always trying to push for that's certain percentage I might never had that exact 1 I think our goal was to get to about 98 per cent that that and if we could do that with a 98 per cent we refine with 2 per cent that wouldn't that arise correctly so it's kind of trying to figure out how important is it for this classification to happen and for which the city cases do you actually care about it being important right we don't want human showing up on the things we want really good at identifying and this is a human and we don't want that picture for a various number reasons we don't want it to happen so we care about that accuracy in that situation so you need a specific we decided which situations do you care about if I showed you a picture of a dock and a picture of a dog and up on the website I care a little bit less about that and especially the cooked up but it's not a cook duck then now I I probably don't care as much as if a person showed up a text is is this specific situations that are actually difficult for us to mitigate so we had to retrain that frequently and part of it was when she get the system that is stable and you don't see any problems there really isn't a reason to keep trying to retrain right if you can get the numbers they're looking for is find of so as you repeat the question how often do we retrain of so it doesn't happen often far less than you think I think unless there's some catastrophic problem were like 0 no we miss this situation or another situation I could come up is users find a new way to abuse our system and we wanna say like a right we need to prevent this how can we prevent that then we might trained on additional set of data is getting better classifying this new set of problems have I ever participate in the image classification the competition now I so that's cool yeah I mean image classification has a lot of potential users moving forward and especially as we get better and better at doing it there more situations where we can find a users might my favorite story as I was reading about image classification was this farmer in Japan because I live in Japan right now but we his family was a farming family and they sold cucumbers and I don't you know about food in Japan but they're very weird particular things that they identify and they're like others in a regular cucumber This is a premium cucumber arena charge you 30 dollars for this premium cucumber and all that kind of classification was done by hand and so engineer who was really a programmer but could basically build the hardware this used in its training to have identify like this cucumbers this category in this 1 and this 1 and automatically Saudis cucumbers and kind of made that process a lot better there's an article online you can read about that particular problem but the point is that image classification can be used in a number of industries for any number of reasons and I think as we move forward as developers is going become more and more commonplace to wanna be able to do things like this so they come at a time you know it this happened
arms so arm