We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

How to train an image classifier using PyTorch

00:00

Formal Metadata

Title
How to train an image classifier using PyTorch
Subtitle
Building an image classifier that can recognise cities
Title of Series
Number of Parts
118
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Neural networks are everywhere nowadays. But while it seems everyone is using them, training your first neural network can be quite a hurdle to overcome. In this talk I will take you by the hand, and following an example image classifier I trained, I will take you through the steps of making an image classifier in PyTorch. I will show you code snippets and explain the more intricate parts. Also, I will tell you about my experience, and about what mistakes to prevent. After this all you need to start training your first classifier is a data set! Of course I will provide a link to the full codebase at the end. The talk will focus on the practical aspect of training a neural network, and will only touch the theoretical side very briefly. Some basic prior knowledge of neural networks is beneficial, but not required, to follow this talk.
Keywords
20
58
Wave packetMedical imagingArtificial neural networkComputer animationLecture/Conference
Computer-generated imageryArtificial neural networkBuildingExecution unitWeightSocial classHeat transferMappingMedical imagingWeightArtificial neural networkHeat transferSocial classPoint (geometry)CASE <Informatik>Wave packetLinearizationFunction (mathematics)outputConvolutionResultantFunctional (mathematics)Sampling (statistics)Set (mathematics)Right angleExecution unitCategory of beingEndliche ModelltheorieMaxima and minimaBitComputer-assisted translationQuicksortCoefficient of determinationPattern languageError messageDimensional analysisGreen's functionComputational intelligencePixelCuboidLine (geometry)AverageDifferent (Kate Ryan album)Area1 (number)MereologyOrder (biology)TouchscreenSeries (mathematics)Matrix (mathematics)CircleTotal S.A.Dot productRoboticsGreatest elementBellman equationThread (computing)Computer animation
Heat transferData modelEndliche ModelltheorieWave packetLoop (music)Computer-generated imageryInheritance (object-oriented programming)Social classMaxima and minimaLevel (video gaming)Parameter (computer programming)Artificial neural networkModule (mathematics)Bellman equationGradient descentPixelStochasticConvolutionCASE <Informatik>BitLibrary (computing)Medical imagingType theoryInsertion lossAsynchronous Transfer ModeBootingDifferent (Kate Ryan album)Wave packetOrder (biology)Loop (music)Endliche ModelltheorieNumberMathematical optimizationoutputRaw image formatFunction (mathematics)Functional (mathematics)Vector spaceObject (grammar)MereologyDrop (liquid)Set (mathematics)WeightConnected spaceSubsetComputer configurationInformationRight angleDeconvolutionRevision controlCheat <Computerspiel>Video gameSubject indexingMatrix (mathematics)Kernel (computing)Digital photographySlide rulePattern languageSequence
Performance appraisaloutputPredictionSinguläres IntegralMachine visionBootingHeat transferStapeldateiWeightMathematical optimizationMomentumBit rateMathematicsSweep line algorithmPlot (narrative)Scheduling (computing)Medical imagingStapeldateiSet (mathematics)Optimization problemMathematical optimizationBootingGradient descentWave packetSoftware testingNumberoutputLevel (video gaming)Parameter (computer programming)Loop (music)Drop (liquid)Asynchronous Transfer ModeDifferent (Kate Ryan album)Order (biology)Endliche ModelltheorieInformationRandomizationPerformance appraisalMultiplication signProcess (computing)WeightMobile appCuboidBitArtificial neural networkInsertion lossCorrespondence (mathematics)Function (mathematics)Bit rateMereologyDigital photographyScheduling (computing)QuicksortTensorPixelPlateau's problemState of matterMessage passingPoint (geometry)Maxima and minimaPatch (Unix)Source codeRight angleSystem callPlotterGraph (mathematics)Vector spaceGreatest elementCalculationDimensional analysisGradientPredictabilityFlow separationMiniDiscSpacetimeRange (statistics)Social classFunctional (mathematics)CASE <Informatik>Computer configurationGoodness of fitStochastic
Computer-generated imagerySuite (music)ArchitectureMedianFile formatSquare numberBildtelegraphieSet (mathematics)Digital photographyCASE <Informatik>Software testingMedical imagingMathematicsRight angleBuildingEndliche ModelltheorieGoodness of fitDot productMetadataArmWave packetMultiplication signPlastikkarteBoss CorporationDemosceneMedianSubsetView (database)Mathematician
Wave packetData modelSign (mathematics)PredictionSocial classUsabilityComputer-generated imageryCASE <Informatik>Wave packetSet (mathematics)Mobile appQuicksortSoftware testingMedical imagingEndliche ModelltheorieDigital photographyResultantBlogLink (knot theory)Boss CorporationPlanningHeegaard splittingCodeNumberProjective planeError messageBitBuildingHacker (term)PredictabilityPosition operatorSocial classLine (geometry)Right angle
Sheaf (mathematics)Morley's categoricity theoremBitCuboidMedical imagingEndliche ModelltheorieWeb 2.0Sinc functionAxiom of choiceLine (geometry)LaptopProper mapSoftware testingWave packetCASE <Informatik>Virtual machineCategory of beingComputer hardwareDifferent (Kate Ryan album)Inheritance (object-oriented programming)Lecture/Conference
Transcript: English(auto-generated)
Thank you So we're definitely not going to do any particle physics today But today I'm going to give you help you do take a first few steps on training your first image classifier using pytorch
Just before we start. Can I get a hand for everyone who has used pytorch before? Only a few that's good because we're really going to start at the beginning And so well basically first I'm going to tell you first to start at What is actually an image classifier? That's not so difficult
Then we'll go by what's the neural network? How do we actually build one in pytorch and then finally what can we do with them and in the spirit of this morning's? This morning's keynote I'm going to show you a bit how I played around with an image classifier and what cool things you can do with them
Okay, so first let's have a look at what is it classifier So suppose we have a label training sets of points So we have two dimensions x1 and x2 and then we have a set of data points and they come in two classes Either red or green Well, okay, that's cool Well in this case
We can see a pattern sort of the green ones are on the left side of the screen and the red ones are on The right side of the screen so we could say for example Well, let's draw a line somewhere around here and then say everything that's left of this line is green and everything That's right of this line. That's red. So now we have a model a very simple model It's just a line that can tell us whether a point belongs to one class. So the other one
So then when we have an unlabeled data set we have points like these in this case just white points Then well what we can do is we can use the same line that we Had before and use that to color the all the points that we had into green and red
Well so far that's a classifier and I'm pretty sure and that most of you have seen something like this before Now sometimes you want to do something that's a little bit more complicated Than using just a simple line And In this case, we're going to use neural networks and neural networks. Probably most of you have heard about about them
Look something like this here You see a series of dots circles and they represent neurons and they come in layers so we have the input layer on the left a hidden layer there might be multiple in the middle and an output layer on The right and they are connected by these lines. So how do these neurons actually work? Well neuron?
Look something like this you have a set of incoming signals on the left and all of these signals are then Multiplied by a weight and these weights are the things that we use to train the neural network
They are the things that we must learn when we actually Build one neural network. So what we do is we take those inputs We multiply them by the weights then sum them up and then in the end apply some kind of activation function of which the main property is that it's nonlinear which makes The neural network more capable of learning stuff. Typically what we use and we'll see that more
Today is we use the rectified linear unit Shorthand is just value function, which is y equals the maximum of zero and X So basically it's zero if X is negative and otherwise it's X. Okay
Well, that's fairly simple, so how do we actually use a neural network like this, what do we do with it? Well in the case, but we have before so we had the 2d or this 2d Points we had two dimensions X 1 and X 2 and then we had these points that we wanted to classify in red or green
Well, we could use the neural network like this We can say well We have two input nodes and one of them is X 1 and the other one is X 2 and then We use we propagate these signals through the network with these weights and these functions and then we think we can say well and we have two output nodes and then one of the output nodes is the
probability that The point belongs to the green class and the other one that it belongs to a red class So now of course if you first make a neural network like this It's not going to do what you want because you have to set these weights to the correct values So, what do you do if you take all those points that you have labeled before
You pass them through the network see whether it makes an error and then tune the weights in such a way that It will actually perform better. So you have to do a lot of tuning before you Get a neural network that actually does what you want
But why we're talking here about image classification and not about 2d points So what we want to do is instead of just having two inputs we want to say when we have an image Well in this case, the left one here is a dog and the right one is a cat That's something that we can easily see but we would want a neural network to recognize something like that for us
And for that you cannot use a simple neural network like the one I showed before But we're going to need something a little bit more complicated We're going to need something that is called a deep convolutional network Well, that's quite a mouthful. But well in the end, it's not so extremely complicated
Let's start with a deep part a deep neural network is just a neural network that has more than one hidden layer Preferably in this case, I think 10 or so and could be 20 or 30 but just More than a few that's all that is there to deep neural networks And then the convolutional part well, we need
Convolutions in order to interpret those images. So on the left side here. We see that our the input to a neural network like this is an image and what we do with these Convolutions is we take a box and we say we aggregate all those pixels from
That area of the image and then apply some function over it We could say we could try to see whether there's a big difference from the light right to the left or the top and the bottom or the whether all The pixels have approximately the same value. It's just some function that we use apply using those weights of a neuron and
Then what we do is we shift this box around across this image such that we get a matrix of the results of all this dysfunction across all the image and in this case You see we don't you just use one convolution But we get in the end for future maps so we could use for convolutions. Of course, you could use many more
Then typically what you do after using a convolution is you sub sample so you save up for all Each out of two by two results We take the maximum result which is called max pooling or we could take the average result or something like that. That's just
We do that in order to reduce the size of our neural network because otherwise it might become too big and if we have too Many ways to tune it becomes too difficult to train Typically after that we do more convolutions. So you have more future maps in this case We do ten more convolutions then more sub sampling and in the end We make a fully connected layer and a fully connected layer is just one
Just like the one I showed you in the simple neural network before and then in the end we have an output In this case two outputs so we could one of them could represent the probability of the image being a robot and the other one Of being a cat something like that okay, well a
Typical example of a convolutional neural network is VGG 16 VGG 16 is a big neural network and it consists of 16 layers And has in total a hundred and forty four million weights And what it does you'll I hope you recognize some of the
ingredients here and we start with an image and this image is 224 by 224 pixels and then three layers one for each color red green and blue and Then so what we do is we start at the left with two convolutions Then some max pooling to sub sample two more convolutions more sub sampling and then more convolutions
Sub sampling more convolutions and then in the end We end up with the blue Areas, these are the fully connected layers and that we have in the end So a standard neural network at the end But then big with like four thousand nodes and in the end we have one layer with a thousand output nodes
So why a thousand? well in this case This is because this network was made To be trained on ImageNet an ImageNet is a collection of 14 million images that was annotated into a thousand classes of which for example cat and dog and
So this this network was trained by With using a lot of computers to get like 90% accuracy on these 1000 classes so well Cool this thing already exists, but aren't we going to train our own
image classifier well, of course we are going to do that and To do that we use transfer learning So remember we had this network and this has already been trained this has a hundred and forty four million weights That have all reasonable values such that it can accurately classify all those kinds of images now
We're going to make use of that by taking off this last part Of 1000 classes just removing that whole last layer and putting our own layer at the end Not necessarily a simple layer like that, but just the new classifier that we put on top. So we remove the end and
We add our own layer. So this has the advantage that all the weights in the Previous layers they already have some reasonable values this network already knows how to recognize sharp edges round edges Strange patterns all those kind of things all the things that you typically see in a photo
It already knows how to deal with those and then in the end it comes up with a set of features and we build a classifier on top of that Isn't that cheating of course, it's cheating but hey, you never get anywhere in life without a little bit of cheating so what we want to do is we're going to make use of what people have done before and
Build our own classifier that doesn't classify stuff that they have trained it on because well if you want to Classify images into exactly those thousand classes of image net. Well, of course, you can use the pre-trained version of VGG um, if you don't well, this is the way to go you just remove out in the last layer and
Well, then you're all set So now, you know how exactly how to train your own image classifier, right? Well, let's have a look at the code but before that you might ask me the question Why did he choose PyTorch and not Keras because you may have heard of Keras as well But Keras is also a library that allows you to do neural networks
But you could say Keras is there first or PyTorch is more flexible You could say Keras is faster, which might sound very important But the main thing that I think is important is that PyTorch lets you play with the internals Basically that means that if you get to tweak the neural networks and not just import them and use them
That you learn more from PyTorch. So that was the main reason that I chose for using using PyTorch Okay now Let's have a look at it and Here it gets a little bit technical. So bear with me
first if we want to Use PyTorch. We want to use a neural network. We have to define the neural network and You do that by Creating a class in this case. We could create a class that's called net and This inherits from the neural network dot module from torch And when we initialize this class we first initialize
The superclass the module not so interesting and then we define the four layers of our class So first there's the convolutional layer, which is a 2d convolutional layer with some parameters We'll go over those in a minute then a pool layer, which is 2d max pooling
So this is a sub sampling where we take from it in this case a kernel size 2 So a two by two matrix We take the maximum value in order to reduce the size of our neural network a bit And then we have two fully connected layers like the normal layers That come After each other at the end and then the second
Fully connected layer ends with ten nodes. So we have ten output nodes Secondly, we have to define the forward method and the forward method it accepts a single argument that is called X That is the input and the input comes in batches and in this key case
These are 32 by 32 pixel images in three channels Now if we then apply this convolutional layer That's the first the second thing we do we apply the convolutional layer to X And that converts it to 18 channels. So we get 18 different types of convolutions
And again 32 by 32 pixels then we applied applied is red you value function Just to make it nonlinear that helps our neural network with learning more complicated stuff Okay, then we apply the pooling which reduces the picture size and from 32 by 32 to 16 by 16 pixels and
Then since we are done with the 2d stuff we have to reshape the whole vector to a single very long vector of size more than 4,000 Then we are ready to apply first the first fully connected layer again the nonlinear function and then lastly
The last fully connected layer after which our output has a size 10 Okay, but hold on we weren't going to train our own neural network right where we're going to do transfer learning. Yes, that's right. So What we first have to do if we're going to do transfer learning. Well, we have to import this
Pre-trained network. So what we can do in this case, I've chosen squeeze net and I did that because well VGG is actually a little bit bigger than squeeze net takes a longer to run So I'd go for the easy option So let's have a look at squeeze net so squeeze net you can simply import it and then instantiate it say pre-trained is true
And it will download the weights for you, which is a big set of weights takes a while But then you get a pre-trained network and you're gonna it's ready to use all ready to use For you, but we weren't going to use that
Pre-trained had worked with a thousand classes. So we're going to modify it. Let's have a look at the internals before we modified So if we have a look at the internals we see if we just print the network it will show us all the layers and We find that it consists of two parts first It's called the first part is called the features and it has a lot of layers and I couldn't fit them all in the slides
I think it's 20 layers or so And then you have lots of convolutions and pooling and these red-lieu functions all seek in a sequence after each other and then in the end, there's the classifier part which Consists to four pieces of which you already recognize three. There's the 2d convolution
There's the red-lieu and then average pooling at the end the first part is dropout and dropout is a Technique to help your neural network learn a little quicker by while you're training it dropping the inputs or the outputs of
Half in this case with the probability of 50% so half of the neurons that makes it impossible for the network to rely on a single neuron or a Small subset of neurons so it must make more connections to learn the same information Which basically makes it more robust
So what happens here in the classifier is we apply this dropout during training then we have this 2d convolution from 512 to a thousand and this is again where you see the 1000 classes of output In the end, and then there's the value in the average pooling
So in the end we have again a thousand outputs and one for each of the classes that it Wants to be able to classify Now if we are going to change this and make it our own classifier for our own classes Well, then all we need to do is well simply define the number of classes that we have for example for
Download the model set it up First it has a parameter that says the num classes so we can update that to four although internally It's it's not even used but let's do it To be complete and then what we can do is we can take this classifier part remember that it's
the with the 2d convolution layer was the one with index 1 and we can simply replace it with a new 2d convolution layer That goes from 512 just like the original but now to our number of classes not a thousand So that's all you need and now you have a new neural network that you can train
In order to classify your classes Okay, now let's have a look at how you train a model like this That looks like this so we start with setting our model to the training mode that's important I'll get to you and to why in a little bit
Then we need to define our critique criterion How do we score whether the model is good or bad in this case? We use cross entropy loss and we need to define an optimizer and the optimizer in this case is stochastic gradient descent We say okay. These are the model parameters and then there are some
arguments that we'll have a bit look at in at a later stage and then We loop through stuff that comes out of a loader object And again, we'll look at the loader object later and these are the inputs so the images and the labels so the classes that you've labeled them to be and
for each of those sets of images and labels because we do this in batches you always Process multiple images at the same time for each of those sets We first reset the optimizer because we don't want to use any information from the last batch Then we simply pass the images through a neural network and then we get some outputs We calculate how good the outputs are do the outputs correspond with
Labels that we give it Then we propagate these this loss backward through the network So we calculate for each neuron how well did it do on scoring your your training images and then in the end when we know that we can optimize the weights and
Then every time we loop through all our training images We call this one epoch and you're going to do this quite a few times When you want a classifier that works a little bit well Okay, and once you've done that say suppose you've you've trained 20 epochs
Then of course you want to know how well does my model actually work? Well for that first we set the model to evaluation mode. So what is this difference between the training and evaluation mode now? Well Most importantly it disables the dropout of course if you're going to train that it might work well to
Let your model use only half of the information in some some of the stages But when you're evaluating when you're trying to actually classify an image You want to make sure that you use all possible information that you have and disable dropout So that's the most important reason why we always must call these eval and these train methods
Well, then we can say with no grad which prevents The pi torch to do internal calculation that's internal calculations that you don't need and then again We loop through this loader. We pass the inputs through the model to get the outputs we can get For to the outputs
These are vectors with the probability for each class and we can get the maximum of these which is then the class that it Would classify the image as being so we can get the predictions from that and we can sum the loss in order to get Some idea of how well our model is performing So I promised you also to have a look a closer look at the loader
So where does our data actually come from? Well, what you need to do first is specify Where are your images on disk and you do that? by defining those image folders and you want to have a separate train and test set and so you You define two image folders one with the path to the train
Path through the train images and one with the path through the test images But to both of those you need to first also define a transform So what methods will be applied to the Images when they are loaded and we define two different transforms one for the training images and one for the test images
Let's first have a look at the test images So what we do is we say we can compose a transform So it can put it consists of multiple steps first. We resize it to size 256 then we crop out to the center the two to Two hundred and twenty four pixels in the center and then we transform it to a tensor such that a pytorch can work with it
Well, that's fairly simple. But for the training images we do something that's a little bit different What we do is we take a randomly resized crop of the same size From the image so we don't always look at the same part of the image But it could be a little bit more zoomed in or a little bit more sound zoomed out or a little bit more
To the left or the right This means that every time that we train an epoch our model actually gets to see a different set of images Well, the source images were the same but the actual image it looks at is just a little bit shifted or zoomed So it gets to learn not from the individual pixels, but from actually the information that's in the image
That's really important Now once we've defined those train and test sets we can define the train and test loaders which are different simply a data loader where we Provide the data set that we want to use we set the batch size
That's the number of images that we process at the same time The number of workers is the number of processes that can process these images while loading them And we say we want to shuffle them That means that every time we train an epoch or we evaluate we do this in a random order For training. This is really important for testing. It isn't
Okay, so we're almost there but I skipped something that's fairly important Remember that when we defined our optimizer, which is stochastic gradient descent. I said well, they're these arguments at the end And the most important one is the first one the LR is the learning rate
And this is the rate at which we change the weights when we're training So we need to figure out what is actually a good value Now suppose that we don't have only a single weight and we only I can only plot Make a plot in a single dimension. So suppose we have a single weight and we want to optimize this
We want to find the place right there at the bottom of this graph Now suppose that we start all the way at the right of this graph and we want to by taking little steps Find the bottom of the graph then of course We want to make sure that we don't take for example steps that are too large if we make steps that are too large
You know, you could step all the way across the valley to the opposite side And then if you're unlucky you might even go so far away that you in the end step out of the valley and even Reduce the performance of your model on the other hand
If your learning rate is too small then first of all, it takes a very long time to get there But in this case, you'll find this local optimum there and you won't fight the global option So balancing this learning rate is really important So, how do we actually find the best learning rate for our problem? Well, the best thing that you can do is just try them out
basically, well here we define a function we set the learning rate for the optimizer to a certain value and then for certain range of values So this log space from some minimum learning rate to some maximum learning rate with a number of steps and for each of these learning rates, we we set the optimizer to this learning rate and then we train for a number of batches and then
After that, we evaluate for a number of batches So what you'll then find is that of course during the course of the doing this your model is going to first You're starting with a very Low learning rate is going to improve very very slowly and after a while
this improvement is going to be quicker and quicker and quicker until your learning rate is so big that it will go all the way away from your Local or global minimum where you're at and the performance will degrade enormously So
What this will look like if you do this? It's something like this and so typically what you see is that first you have some some value of loss and as you Increase the learning rate your loss will go down until after at some point It will go up way all the way until your model doesn't do anything anymore
So what we found here is that typically something well like 10 to the minus 3 is the optimal learning rate So that's what we set it to But of course the optimal value also depends on the state of your model if your model doesn't do anything yet Well, then probably a very high learning rate is good
Well, if it's almost there You just want to squeeze out that last percent of accuracy then probably a very low learning rate is the right way to go So that for that We have the learning rate scheduler we can use for example the reduced learning rate on plateau, which is a scheduler that
Whenever the Performance of your model during training is has reached a sort of Pluto is stable It reduces the learning rate and then tries again so after every epoch we then have to call scheduler that stopped with the loss that we found and Based on that it might reduce that learning rate
That looks like something like this So if you while you're training and your accuracy goes up at the beginning and then after a while it figures out Okay, maybe we're stable now. Let's reduce the learning rate and you'll see that it takes these steps After all the time until what at some point you decide that the accuracy is good enough
Okay, we're all set let's have a look at actually some data that I've played with So, of course if you want to train a model you need data so what I did is I took one of these a raspberry pi and I set it to work for a couple of Once and I gathered a data set of photos taken in the world's largest cities
I took 72 cities half a million images of 10,000 photographers all in all some 30 gigabytes of data and I made sure that all of these are licensed for Reuse such that I can show you show them to you right now
So, of course the first thing that you do when you gather a data set is you have a look at the images Themselves. So first I live in Amsterdam So what I did is I had a first had a look at all or a subset of the images that were taken in Amsterdam so this is a nice one and Typically, we don't have weather like this so often But well, it's nice right something very typical also that you'll find in Amsterdam or bikes
This is a very typical scene from Amsterdam for those of you who have been there. I'm sure you'll recognize it Okay. Well, this looks good, right? Let's have a look at another one So this is a really nice view But hold on this wasn't taken in Amsterdam
We don't have any cliffs like to within 200 kilometers from Amsterdam probably even more So what's going on? Let's have a look at the metadata. Well, there's this tag that says Amsterdam So probably someone thought that we were taken in Amsterdam why it wasn't on the other hand It also has a tag that says Dublin
Interesting interest There quite a few tags actually on this image And it couldn't fit more than this. This is less than 5% of the tags of this image And I certainly don't see a teddy bear museum on this image or all kind of see these things So it turns out people don't always tag their images
as they should be So I had a look at where all the images that were supposedly taken in Amsterdam were taken in the world. Well around here and Actually the bench that we were just looking at was taken right there at the edge in Korea some nice island in Korea Well, definitely not answer. So what you can do then
What you could do, of course We only want the images from Amsterdam that were taken right there the red dot in the middle That's where Amsterdam actually is what you can do is you can take all the images take the median latitude and longitude and Now the mathematicians will cringe because of course Circular values and you can't take a latitude a median of the latitude or a longitude
Well in the end you can just do it and it works Then what you can do is you can remove all the images that were more than five kilometers away And you repeat this for all cities then we have a clean data set right? Okay, let's do it Okay, then after that I had a look at all the other tags and thought of something cool that we could do
These were the most common tags in this data set. Well, of course if you're going to look for Photos taken in cities than the most common tiger city. Well, but I think Other than city maybe and this one skyline here is the most interesting one
Let's try to make an image classifier that recognizes skylines of cities So what I did is I took the 10 most common cities in my data set Those were these Here that the image counts I split these into a train and a test set like this and then We train a model
Hold on we wait first we wait And the way they actually is quite annoying because well training a model like this takes a while in this case I took a fast GPU. I used my boss's credit card. He doesn't know yet You'll be a little bit surprised
And then I spent like 20 hours or so on training time and then I had a model so then well You feed in an image and who knows where this image was taken This is London and they've got it. Correct. Okay. Well, that's nice. So this one where is this?
This is Sydney and it learned that all right. Well, that's cool this one Anyone, this is Toronto. I heard it right there. Okay, cool This one stuff. Where is this?
This is LA and actually the model got it. All right, so it's pretty impressed So this is clearly Chicago right and then here we have Philadelphia Got it. All right again. This is Tokyo cool Even this one. It's not really in so complicated. It doesn't have too many buildings
I would say I wouldn't know it but it got it. All right, it's Houston Here we have Shanghai and this is clearly Chicago, right? Wait, what what just happened? Well, it turns out there was one photographer who labeled all his
Photos with the tag Chicago. Well, all he did was take photos of sandals on pavement And my model well, he got it. All right, it learned that a sandal on pavement It must be in Chicago And of course well then when you feed this test image to the model it gets all right. This must be Chicago
Okay, so we need to fix this Let's come up with a plan okay, first what we can do instead of Splitting but the images randomly in train test set what we can do is we can split them by photographer in this case At least all those sandals will end up either in the train or in the test set
And we wait takes a while This is really annoying once in a while because well At it's late at night you want to do some hacking on your project and then you think of a solution You fix it you start training and then well, you must wait till tomorrow to see the results
Anyway, the results of this one were terrible because of course well If you put all the sandals in the train set and it will get a very high train accuracy But the test set accuracy will be terrible. In fact, it will just be over trained on those In the end. We just have too many mistag photos
So I needed to come up with another plan and in this case What I did is I built another model and it is good. So what I did I took a model that classified only two classes and I said either it has a skyline or it is not a photo of a skyline and
I trained on all data that I have so half a million images and I gave them the labels Either it is a skyline when it has the skyline tag or it's not a skyline when it doesn't have the skyline tag Then I could make predictions for all data and make and then only use the data That were labeled with a positive prediction for skyline for my original model
Again, then we have to wait takes a while Gets really annoying after a while But then in the end the results of this were pretty nice Out of those with a tag that had skyline I had about 6,000 that were labeled by the model as actually having a skyline and about
1,000 that were labeled as not having a skyline so I could just get rid of those But at the other hand, I got a thousand images that did have a skyline according to my model but that Didn't have the skyline tag. So still I ended up with about the same Number of images so I recreated this train test split that had to wait again
As I tell you this gets really annoying and my boss will not be happy with me And in the end I got yet more results and so as you can see in the end the accuracy was about Well 70%
after 200 Apple Training app works or so. So that's I think 24 hours of training. I think this was fairly reasonable Okay. Well, that's cool Let's have a look at some of the actual results So this was it got it right Chicago. Of course, that's something I would also recognize
So that's cool, but it means it actually Learns to recognize some of these cities Also, Los Angeles. It got it right in this case the model said New York City while in reality the label was Philadelphia But
To be honest looking at this picture, I probably would have gotten it wrong as well. So sometimes it's not that bad Here we have an example where the model says it's London probably because of the bad weather But it was actually taken in Toronto Sometimes though you cannot explain the errors that the model makes
Because in this case, well, although the skyline is a bit difficult to see But you can see some high buildings in the end in the background, but you can clearly see that This street definitely is not an American Street, but it's something like in Asia So and this was actually taken in Shanghai and it got it wrong
Yeah, so That was it but it before I and just some funny final remarks Training your own image classifier really isn't that difficult. All you need to do is cheat a little and do transfer learning
Otherwise, you won't be waiting for 24 hours, but for months on end Doing PyTorch is fun Keras might be easier and faster But PyTorch is a lot of fun and in the end having clean data is way more important than having a good model Thank you in After This if you want to have a look at my code, you can have a look at this GitLab link
You'll find an example or all the code that I used to create this image classifier And keep an eye on our blog blog that go to driven.com where I will make sort of transcript of this talk Thanks
There are questions. We have two mics. So please line up
Hi, and you mentioned two very different kinds of hardware the Raspberry Pi and the GPU can you say a little bit more about Whether this is really practical on a Raspberry Pi alone And if it's not then how does one actually take the next step to use the GPU as well?
Yes, of course, thank you. I Did not train any of them these models on the Raspberry Pi. I I merely used it to collect my data Since that's just a bunch of web scraping You don't need any Like big hardware for that
If you're going to train a model, I tried training it on my my macbook that takes way too long I did have to get a machine with the GPU On the other hand if you have a very small training data set you might give it a try on your laptop It's still fun. You might get as far as 10 or 20 epochs and then get a reasonable accuracy
It still is fun to play with if you want a little bit better Getting a GPU or I don't know getting a cloud machine with the GPU There's always the way to go. Yeah one miss
Categorization where you had the Shanghai like very small section of skyline. Why include that in the test set? So I didn't actually Actually make the choice myself to include this in the test set I included all images that
Were classified by the previous model as being a skyline And so apparently it got to recognize the properties of what is a skyline and it recognized in the background This this is something that looks a bit like a skyline Well, I guess my question is you mentioned that clean data is better than having a good model And so this to me doesn't look like clean data I mean it might be like I'd expect a super genius classifier to figure it out
But I would have excluded this if I just saw your talk and didn't see this example so I'm curious if I'm just wrong and there's some value and stuff like this or Well in this case, I guess it's just laziness I didn't go through my entire test set before using it Because we're going through thousands of images and manually labeling them as good or bad is not my idea of fun. I
Think a proper data scientist must be lazy any more questions. Well again, thanks