We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Building a Powerful Pet Detector in Notebooks

00:00

Formal Metadata

Title
Building a Powerful Pet Detector in Notebooks
Subtitle
Yes, there will be dog pictures
Title of Series
Number of Parts
118
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Ever wondered what breed that dog or cat is? Let’s build a pet detector service to recognize them in pictures! In this talk, we will walk through the training, optimizing, and deploying of a deep learning model using Azure Notebooks. We will use transfer learning to retrain a MobileNet model using TensorFlow to recognize dog and cat breeds using the Oxford IIIT Pet Dataset. Next, we’ll optimize the model and tune our hyperparameters to improve the model accuracy. Finally, we will deploy the model as a web service in. Come to learn how you can quickly create accurate image recognition models with a few simple techniques!
Keywords
20
58
BuildingLaptopMechanism designMultiplication signBitContext awarenessVirtual machineCodeDemo (music)Integrated development environmentLevel (video gaming)
Computer virusComputer programmingCoefficient of determinationInformationDigital photographyProduct (business)Data managementBitContext awarenessVirtual machineLaptopAlgorithmWeb serviceSimilarity (geometry)XML
Virtual machineQuicksortBitoutputDifferent (Kate Ryan album)Coefficient of determinationAlgorithmPixelVirtual machineMachine learningBlack boxSpacetimeComputer-assisted translationGraph coloringFunction (mathematics)Domain nameMedical imagingDigital photographyField (computer science)Computer animation
Virtual machineArtificial neural networkComputer networkMathematical modelComputer-generated imageryElectric generatorPredictabilityArtificial neural networkVirtual machineSet (mathematics)Medical imagingSoftwareMoore's lawMultiplication signBitPixelInformationLevel (video gaming)ConvolutionComputer-assisted translationCoefficient of determinationComputer animation
Machine learningAbelian categoryComputer-generated imageryDreizehnVirtual machineSoftware testingCategory of beingSet (mathematics)Product (business)LaptopProcess (computing)Pattern recognitionCartesian coordinate systemMedical imagingConnectivity (graph theory)Entire functionFunction (mathematics)NeuroinformatikWave packetMereologyScripting languageLevel (video gaming)Different (Kate Ryan album)Self-organizationVisualization (computer graphics)Module (mathematics)Attribute grammarFile formatMathematical modelMachine learningWeb servicePower (physics)Integrated development environmentAlgorithmTunisPredictabilityDigital photographyComputer clusterCoefficient of determinationPoint cloudComputer-assisted translationCuboidMultiplication signDemoscenePoint (geometry)CodeInferenceType theoryScaling (geometry)Link (knot theory)BitSoftwareWeb 2.0Repository (publishing)Lecture/ConferenceComputer animation
IterationCodeMedical imagingVisualization (computer graphics)LaptopGraph (mathematics)Mathematical modelContext awarenessComputer animationLecture/Conference
FreewareLaptopVisualization (computer graphics)Virtual machineScaling (geometry)CuboidRemote procedure callScalability
Repository (publishing)ScalabilityDemo (music)Remote procedure callLink (knot theory)Lecture/Conference
LaptopSpeciesDirectory serviceDemo (music)Computer-generated imageryDomain nameVirtual machineFunctional (mathematics)Hazard (2005 film)Projective planeInstance (computer science)Data structureLaptopComputer fileMedical imagingMathematical modelVisualization (computer graphics)Connected spaceComputer animation
Computer-generated imageryHeat transferMathematical modelGraphics processing unitLaptopPlastikkarteWindowVotingBefehlsprozessorWave packetDemo (music)Installable File SystemTotal S.A.Local ringAsynchronous Transfer ModeEntropie <Informationstheorie>InformationVariable (mathematics)Mathematical modelInformation retrievalSet (mathematics)Power (physics)Attribute grammarMedical imagingWave packetHeat transferFunction (mathematics)Multiplication sign2 (number)SoftwareScaling (geometry)BitCurvatureProcess (computing)Parameter (computer programming)WeightMoore's lawBit rateMobile WebLecture/ConferenceComputer animation
Cellular automatonBit rateRandom numberExpert systemRevision controlMathematicsMaxima and minimaBoss CorporationRecurrence relationConcurrency (computer science)Demo (music)Menu (computing)Metric systemCore dumpFunction (mathematics)Newton's law of universal gravitationBit rateWeightIterationBitLatent heatDifferent (Kate Ryan album)Medical imagingInformationGraph (mathematics)Web serviceLaptopRadical (chemistry)Virtual machineCodeValidity (statistics)Mathematical modelRandomizationSlide ruleProcess (computing)Visualization (computer graphics)Multiplication signSystem callSoftwareFunction (mathematics)Scripting languageWave packetUniformer RaumTask (computing)Pay televisionPattern recognitionPoint (geometry)Lecture/ConferenceComputer animation
IterationComputer-generated imageryMathematical modelMachine learningComputing platformBit rateWave packetNatural numberArtificial neural networkMedical imagingDifferent (Kate Ryan album)Scaling (geometry)Machine learningAlgorithmPay televisionVirtual machineWeb serviceMathematical modelHeat transferDrop (liquid)WeightParameter (computer programming)Process (computing)Drag (physics)Code refactoringRight angleComputer animation
Machine learningDemo (music)Code refactoringComputer configurationLaptopComputer fileCodeMathematical modelCore dumpCellular automatonRaw image formatSource code
Computer-generated imageryInteractive televisionWindowComputer configurationLaptopForm (programming)CodeText editorModule (mathematics)Cellular automatonType theoryWorkloadCode refactoringRepository (publishing)Computer fileVideo game consoleSource code
Physical lawCellular automatonScripting languageSoftware testingMathematical modelCodeInternetworkingSource code
SpacetimeSoftware testingDynamic random-access memoryUnified threat managementGamma functionSicMedical imagingInternetworkingStructural loadDebuggerCellular automatonSource code
Machine learningVisual systemCodeVisualization (computer graphics)BitSpacetimeCellular automatonCodeInheritance (object-oriented programming)Source code
CodeVisual systemSoftware repositoryVisualization (computer graphics)Different (Kate Ryan album)CodeLink (knot theory)LaptopComputer fileVariable (mathematics)Complete metric spaceCellular automatonRepository (publishing)Module (mathematics)InformationComputer animation
Digital photographySlide ruleLecture/Conference
Right angleCategory of beingCoefficient of determinationTask (computing)CuboidComputer animationLecture/ConferenceMeeting/Interview
Category of beingDifferent (Kate Ryan album)AlgorithmBitMedical imagingSet (mathematics)Mathematical model
Suite (music)SoftwareWeb serviceCognitionLecture/Conference
Demo (music)Suite (music)Web serviceMedical imagingRepository (publishing)Pattern recognitionLocal ring
Medical imagingMathematical modelData storage deviceRepository (publishing)Limit (category theory)SubsetLaptopFreewareLecture/ConferenceMeeting/Interview
Web serviceSet (mathematics)Medical imagingData storage deviceVariety (linguistics)Object (grammar)Lecture/Conference
Object (grammar)Set (mathematics)Sampling (statistics)Goodness of fitCuboidCASE <Informatik>Medical imagingLatent heatOnline helpLecture/ConferenceMeeting/Interview
Coefficient of determinationCuboidSpacetimeBitGoodness of fitMedical imagingLink (knot theory)Slide ruleRepository (publishing)Set (mathematics)Lecture/ConferenceMeeting/Interview
Transcript: English(auto-generated)
So today, as you mentioned, we're going to be talking about how to build a pet detector in a notebooks environment. So just to give you a bit of a context, we're going to start off by doing a high level overview of the deep learning mechanisms we'll be using and the machine learning workflow. And then we'll dive deep into the spend most of the time in the demo notebook and talk about the actual code.
So just to give you a bit of context, I'm Katherine Kampf. I work for Microsoft. I'm a program manager on a product called Azure Notebooks, which is our free Azure hosted Jupyter Notebook service. So we'll be seeing it shortly. And here's some of my contact information. I'll be
showing it later, so don't worry too much about taking photos or anything just yet. The most important thing to know about me is I really love dogs, and even me knowing a bunch of dog breeds, it's still sometimes difficult to tell them apart. If we look at Alaskan Mound Moots, Siberian Huskies, they can look super similar.
And especially when you're then trying to train a machine to understand this, it can get difficult and you need a ton of data to get an algorithm that can successfully distinguish between these different breeds. So the way we're going to approach this today is using a technique called deep learning.
So this is often how deep learning is viewed where you have an input, so that's the dog photograph, and this sort of black box where we don't really know exactly what's happening, but then we get these outputs of either a dog or cat or other. Or in this demo, we're actually going to go a bit farther to say if it's a dog or a cat, which breed do we think it is? And so this differs a bit from traditional machine learning where you'd often be doing manual feature extractions.
So this requires a bit more hands-on work to say, you know, try to discern which features of the data are most important, whether it's size or color, et cetera, and it also requires domain expertise.
So this is easier to think about in the pet classification topic space, but if you try to apply the same thinking to a really specialized field, it can get a bit more difficult to do this manual feature extraction. And then you'll also need to run through different classification algorithms to try to distinguish which will work best and get you the highest accuracy.
And this differs again from deep learning where we're going to be doing a layered approach trying to understand different features pixel by pixel of our images, and it's going to be the machine doing the heavy lifting of that work rather than manual data science work tuning the features.
So like I said, deep learning will require a ton of data, but once you have that data, it becomes a lot easier for the machine to try to understand the data set and generate predictions. Specifically within deep learning, today we're going to be using what's called a convolutional neural network. So this is a really popular network to use specifically for image classification, and that's because this works by preserving the RGB channels in
the first layer, so this will take each pixel value, what value they are, red, green, blue, and preserve that in the first layer. And then we'll use a technique, and this is at a very high level, but we'll use a technique
to filter on the images themselves and try to extract which information is most important in the convolution layer. And then next we'll move into pooling where we'll try to aggregate that data and reduce the amount of information because, of
course, deep learning is super computationally expensive, so if you can have a bit less of information going into the fully connected layer, it can save you a lot in time and compute power. And then in the end, we'll get our prediction for whether or not the animal is a dog, cat, or something else if the network's a bit confused.
So if we take a step back and look at the general machine learning workflow that we're going to be walking through today, you're often going to want to start off with data exploration and the data itself in a machine learning workflow. This is going to be where all of the power for your deep learning network comes from, and it's going to be the most important part of the process.
So this will involve finding a data set, possibly transforming the data set into a particular format you need, cleaning any data, and running some visualizations trying to understand the basic attributes of the data. And once you have a sense of that, you can move into training. And with training, this is where we'll actually
be developing our algorithm, so there's three main concepts that we like to think about with the training script, compute, and tuning. So to start off with our training script, this is where we'll try out different algorithmic approaches, and of course, your local box can be pretty powerful, but compute-wise, depending on the size of your data,
you might need to scale out to a larger VM in the cloud or on-prem or a cluster computing environment. And once you have that algorithm, you might be fully satisfied with the accuracy you're seeing, or you might want to do some tuning to try to refine that algorithm to see a bit higher accuracies.
And once you're happy with your model, then you'll move into the inferencing stage, and this is where you're actually using your model in an application. So this involves three different components, where you'll have productization, so this can often mean refactoring your code. So you might have started in a Jupyter notebook environment, but you need to output a Python module,
so you'll have to do some work there to refactor it, and we'll talk a bit about that later. And then deploying your model to a web service, so you can use it in your applications or other folks across your company or organization can as well. And once your model is deployed, then you can write a test application to send a photo of a dog and get returned back with your breed prediction.
So that's the high-level overview of what we're going to be walking through today. We'll spend most of our time in these first two stages just for the sake of time, but I'll point you to a GitHub repository at the end of this that walks through the entire lifecycle.
So we're going to start off talking about the data we're using. So today we're using the Oxford Pet Dataset, so this is a pretty common dataset which contains 37 categories of different pet breeds, so this is cats and dogs, and around 200 images per each breed.
And here's a link to it, I definitely encourage you to, especially if you're getting started with machine learning or image recognition, this is a really great and well-labeled and well-documented dataset to use. And then once we get into wanting to explore and understand our data, there's a bunch of different great tools you can use depending on what types of data you're working with or the scale of your data.
And today we're going to be using notebooks, so if you're not familiar with Jupyter notebooks, they essentially let you combine markdown text, images, visualizations, etc., alongside executable code. So it's super useful in data exploration and data science in general to tell a story around how you got to
a specific graph or how you got to a specific model, so you have the context either to present to others or look back on your own work and understand each step you went through and get strong visualizations of your data. Specifically today we're going to be using what I mentioned earlier, Azure notebooks, mainly for the free setup and scale out, so
my local box is fairly powerful, but for something like this I'd rather use a really big, beefy GPU machine in Azure, so Azure notebooks lets you connect from their free compute to a remote VM in Azure, so super useful for scale out scenarios and so I can make it public for all of you to go and play with on your own as well.
So I'm going to switch over to the demo to show you exactly what this looks like. So this is the GitHub repository I'm going to link. Oh, this isn't showing up. Let the IT people go.
Alright, let's see if we can fix this. Ah, OK, we're good. Alright, so this is the GitHub repository. I'll link you to just to give you an overview.
If you Scroll down here, you'll see these little launch badge. If you click this, it will automatically clone it into your Azure notebooks account as a new project, so super easy to get started and once it's cloned in, which I've already done.
You'll get an overview of all the project files here, and this is the compute picker I mentioned, so there's free free compute offered as well as this is my camp VM that I'm connected to, so it's a NC 6 GPU machine in Azure. If we move into our notebook, I'll just start off and run all of these.
So let's first try to understand what the data structure we're working with is, so we know that we're using the Oxford pets data set, but I'm just going to do a quick LS to see the folder structure we're working with, so it looks like we have 37 folders and presumably those contain the 200 images that we're going to be analyzing today.
And just to get a quick visual sense of these, I created a little plotting function, so we can see exactly the breeds we're going to be working with and holler in the back if this fonts not big enough. I tried to make it big, but I think it's very useful to always get this visual sense of your data.
For instance, here we can see all these 37 breeds and it'd be easy to once we have our model deployed submit a breed that's not covered by this data set and then be confused if we submitted a golden retriever, which isn't reflected in this data and saw relatively low accuracy. It's because it's not reflected in this original data set.
So as I mentioned, we're working with 200 images per breed, which may seem like a lot, but for something like deep learning, it's really not enough. That's relatively pretty small data and we'd likely end up with an over-fitted model that wouldn't scale out to the data we'd see in the wild.
Instead of just training on that, we're going to be using a technique called transfer learning. With transfer learning, we're going to take a pre-trained model. This is the MobileNet model that we're using today, which has been trained on thousands of general images. Then what we'll do is retrain that last layer specifically to our 37 pet breeds.
It'll use all the power of someone who trained this massive, massive network, but specify it down to our data set. When I run this training job, we can see here it takes a bit and has a ton of output.
We can see it took around 26 seconds and we saw an accuracy of almost 80%. This is from doing that initial transfer learning, not tuning any of our hyperparameters, which we'll get into later, just using a flat learning rate. We're still able to get to 80% in 26 seconds, which is pretty impressive.
If you look back, this data set was first released in 2012, so seven years ago. Even with a lot more compute power, a lot more time, data scientists were still only able to get to around 59% accuracy. It's pretty impressive how far we've been able to come in just a short amount of time.
79-80% is pretty great, but I want to see if we can improve this all. Now we're going to be working with what are called hyperparameters. These are attributes of your network you can determine beforehand. Specifically, we'll be looking at what's called learning rate. The learning rate is essentially how much you'll let the weight vary on a node from iteration to iteration,
so how quickly you're letting the network learn. Oftentimes, in data science, you find yourself trying out a bunch of these values and just four-looping through randomly, because it's often difficult to determine which value might be the best for your specific network.
Instead of doing that by hand and taking hours to do it, we're going to use something called Azure Machine Learning Service, which lets you distribute this work across a cluster. I have a four-node cluster in my Azure subscription. Basically, I'm going to send the training script to each of the worker nodes.
If we see here, it'll try out a bunch of uniform random values for the learning rate. It'll tell me which gets the best accuracy, so then I can treat that as my best model without having to do as much work. We're going to do a couple of things to make this more efficient.
I need to update my calls. We're going to use this early termination policy. Basically, this lets us, if you see this 0.15, what that's saying is if we're seeing a run and the accuracy is more than 15% away from what our current best accuracy we've seen is, then we'll just cut that run short and free up that compute resource to be used with a new value.
As this runs, we can see a bunch of output. It's not loaded yet, but sometimes it takes a bit depending on Wi-Fi. We can see all these jobs running and a bit of information about our cluster.
We have four nodes running, and it'll run through each different job, tell us how long it took, what its run ID is, etc. I'll come back up in a bit so we can see the visualizations it'll start giving you. Even though it's pretty efficient to distribute it across a cluster, it'll still take around 25 minutes.
This is a 30-minute talk. I don't really have time for that. Here you can see some of the validation accuracy starting to come in, as well as different learning rates.
I'll just skip ahead to a run we've already done in the past. This was a run I did yesterday, and I feed it the specific run ID. Now I can see a bit of the information, see another graph of the validation accuracy as it went through training, and see that the final accuracy was around 93%.
With just an additional 25 minutes of training, we were able to increase our accuracy by 13%, which is pretty exciting, and 93% is a really great accuracy, especially for something like an image recognition task. Now that I have that treated as my best run, I'm going to register it with the Azure Machine Learning Service.
This will basically let me use this model from anywhere or deploy it easily. If I wanted to access this model from VS Code, etc., or in a future notebook, it's registered and available to me, as well as anyone who's working inside my workspace. I know we just covered it a bit, so I'm going to flip back to the slides to review some of the topics.
Again, we just went through training, and we were looking specifically at trying to do deep learning with small data. By nature, deep learning is going to require huge amounts of training data
because it's doing that feature extraction and trying to figure out the best network structure all on its own. You need as much data as possible to learn from for that. 200 images isn't going to be enough, which is why we decided to use transfer learning, and specifically transfer learning with the MobileNet, where we'll be taking the existing MobileNet model and retraining the last layers specifically to our 37 pets.
Since that was pretty good accuracy, around 80%, we decided to see if we could do any better by doing some hyperparameter tuning using Azure Machine Learning Service.
Just to call out a couple more exciting things, if you're just getting started with machine learning, AML can be super useful. It's got experiences where you can just do a drag and drop automated machine learning, and it tries out a bunch of different classification algorithms for you, or use it for hyperparameter tuning like we just saw. And automated compute, scale up, scale down.
I have a four-node cluster in my subscription right now, but I've set the min nodes to zero, so whenever I'm not using it and not running my jobs, it'll scale down and won't cost me money, which is super great. Once we have this model, like I mentioned, you might want to do some refactoring. I'm going to move into VS Code for that.
This is the same demo notebook that we just had, the IPYMB file. But when I open it in VS Code, I see the JSON dump of what a raw notebook file looks like, and I also see this option to import it. So I'm going to go ahead and click that, and VS Code will turn this into a .py file with a bunch of cells.
So here I can see the markdown has been turned into comments. My Python code is still here, and I can visually see that I have these little cells with this run cell option that'll bring up our Python interactive window and run a cell, as you would see in Jupyter, side by side.
This essentially works as a Python console, so you can type code in here, et cetera. So when you're into refactoring, this can be super useful. If we just highlight a snippet of code, we'll have all the refactoring capabilities you're used to with an editor. So you can see we can change all occurrences, extract method, et cetera,
all from what started as a Jupyter notebook, so you can now refactor that into whatever form fits your workload best. And I have an example in the GitHub repository of what a refactored Python module might look like. And once we've refactored it to what we're happy with
and deployed our model, we'll want to go ahead and test it. So this is a testing script I've written, so we'll do the same action and run this cell to see what it looks like. So basically what we're going to be doing here, you can either, we have some code to access a random pet to try it out, or in this example I'm specifically trying with this little chihuahua I found on the internet.
We can see here it loads the chihuahua image, as we would expect. And something new that we just introduced is actually the ability to debug cell by cell. So we just ran this first cell successfully, and now I'm going to hit this debug cell, and we can see I have this breakpoint here.
So once I hit debug cell, it'll open the different tools I'm used to in my debugger. I can step through, et cetera, or just continue on. And once I continue on, I can see that it gets it right, chihuahua with a pretty accurate probability.
So this is super useful to be able to do this debugging bit by bit, and we just introduced it this week, so if you want to learn more about it, come by our booth and we can talk more. But this is going to be super powerful when you're working in a data science space and trying to debug cell by cell.
And so now that we know our pet detector is working pretty well and we have a great probability, we're good to go. So just to rehash a bit about working with Python in Visual Studio Code, we just saw that we have debugging and refactoring capabilities.
There's also IntelliSense, so auto-completion, as well as the ability to import-export Jupyter Notebooks. So as you saw, we were importing that Jupyter Notebook. It transformed it into a .py file with different cells, and you can continue to work and refactor that into a Python module, or you can re-export it back as a Jupyter Notebook,
if, say, you want to present the information, et cetera. There's a variable explorer, data viewer, a bunch of full-fledged data science tools, and if there's anything you don't see that you would love to see, please come talk to us. We are heavily investing in making this a great experience. So now that we've covered most of that workflow, what's next?
So here's a link to the GitHub repository, where you can build your own pet detector, as well as links to try out Azure Notebooks and the data science tooling in Visual Studio Code. And then I have some resources on the next slide as well, but I'll let people take photos of this as they wish.
And I think we have a few minutes, we'll have a few minutes left for questions, and then I'll hang around outside in the hallway as well.
Or I'll just be outside and at the booth. Thank you for the talk. I have a question.
Usually in this kind of example, so we talk about classification to the small amount of categories, like brews or dogs. What shall we do if... What can I train something to classify,
let's say, several thousand categories or maybe 100,000? Let's say I just want to create some classifier that tells me what is on the picture. So how can I accomplish this task?
So shall I try a lot of classifiers for each separate category, or maybe there are some approaches to do it just out of box? Yeah, so it depends sort of on how big the size of each category you have. So if you have hundreds of thousands of categories and five images per each category,
then you'll have to try to use some pre-existing models and do transport learning. But if you're able to do... If you have 100,000 categories and you have 100,000 images within those, then you can employ a lot of different techniques, whether you just want to do traditional deep learning on your full dataset or try out bit by bit,
depending on the topics you're working with in the image. You could try different classification algorithms on each set, as you mentioned. Okay, and another question. Is there some software as a service in Azure that provides classifier as service?
Just for me not to write it by my own, but just call some API and use it? Yeah, so we have something that's called Azure Cognitive Services, which basically are a suite of exactly that, APIs for speech-to-text, search, image recognition, et cetera.
So if you search just Azure Cognitive Services, that's exactly what it does. Hopefully that helps. In the demo, you are using an image folder locally.
Is that correctly understood? In the GitHub repository, you have images there, and they are then uploaded to train your model. The images are not coming from somewhere else. Yeah, so these images I actually put into Azure Storage. They're in the GitHub repository as well, if you want to download them locally.
So it refers to the Azure Storage, so you need to upload it to the Azure Storage and then train it from there. Yeah, so I loaded them onto the specific VM I was working with, but there's a limited amount of free storage in Azure Notebooks as well, so you could upload a subset of the data and work with it there, or upload the full data set to Azure.
And you also have a service for uploading the images right on Azure, if I remember correctly? Yeah, there's a couple services. You can either do it from the Azure portal or use the Azure Data Explorer, which will let you upload data to a variety of stores, depending on what your end store is in Azure,
if it's Blob or Azure Data Lake, et cetera. Is the possibility of marking up where in the image the object is that you are... So this data set is nice, because it actually gives you the full image, and then it also has a box highlight around the face of the pet,
which makes it a good sample data set for cases like this. But do you mean in general or specific for this data set? Specific, yeah. Does that help? Yeah, no, that's good. Yeah, it'll box the face for you to make it a bit easier, so if you're looking at a full dog or just a dog space... I just didn't see it in the GitHub repository,
but it's somewhere. Yeah, the link I provided to the Oxford Pets data set earlier in the slides, if you read there, it'll have the boxed images. Thank you. Yeah. Oh, yeah.
This one or the one? Okay. And I'll tweet the link to the slides as well, if you'd like them. So if there are no more questions... Is there a question left? If there are no more questions, let's thank the speaker again.