We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

AI image search with Go & Tensorflow

00:00

Formal Metadata

Title
AI image search with Go & Tensorflow
Subtitle
Integrate the advances of AI in your Go apps
Title of Series
Number of Parts
561
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
The goal of this talk is to guide you through the integration of pre-trained models into a Go app. From A to Z, we will build an AI image search with Go and Tensorflow. AI is on everybody's lips nowadays. Every company wants to bring it to its customers. Google, Facebook, Microsoft and more are racing against each other with research papers and open-source frameworks. From a developer point-of-view, it has never been so easy to access so powerful AI engines. The goal of this talk is to guide you through the integration of pre-trained models into a Go app. From A to Z, we will build an AI image search with Go and Tensorflow.
10
58
80
111
137
Thumbnail
15:21
159
Thumbnail
18:51
168
Thumbnail
26:18
213
221
Thumbnail
15:22
234
Thumbnail
49:51
248
Thumbnail
23:06
256
268
283
Thumbnail
28:38
313
Thumbnail
1:00:10
318
Thumbnail
21:35
343
345
Thumbnail
36:13
353
Thumbnail
18:44
369
370
373
Thumbnail
44:37
396
Thumbnail
28:21
413
Thumbnail
16:24
439
455
Thumbnail
25:10
529
Thumbnail
15:36
535
Thumbnail
28:04
552
Computer-generated imageryDisintegrationMobile appRoundness (object)Computer animationLecture/Conference
ComputerDifferent (Kate Ryan album)Model theoryMultiplicationSystem callMultiplication signNeuroinformatikBeat (acoustics)Cartesian coordinate systemDifferenz <Mathematik>Computer animation
PlanningMultiplication signMedical imagingSoftware developerBitPattern recognitionVirtual machineMachine learningForestComputer animation
FeedbackProduct (business)Focus (optics)ResultantDifferent (Kate Ryan album)Computer animation
Service (economics)Software frameworkEndliche ModelltheorieEndliche ModelltheorieTouch typingSoftware framework.NET FrameworkDemosceneFacebookMixed realityCognitionComputer animation
Computer networkArtificial neural networkoutputCellular automatonFunction (mathematics)TeilerfunktionShape (magazine)FluidFunctional (mathematics)Multiplication signDamping1 (number)Computer animation
Coefficient of determination
Representation (politics)
ArchitectureModel theoryModel theoryFunctional (mathematics)DivisorFile formatComputer architectureSoftwareWeightTerm (mathematics)Shape (magazine)FacebookWave packetState of matterComputer animation
Software frameworkComputer networkNumeral (linguistics)Keyboard shortcutCore dumpPredictionoutputFunction (mathematics)TensorModel theoryMilitary operationGraph (mathematics)Module (mathematics)Artificial neural networkWave packetCore dumpSoftware frameworkMultiplication signMereologyResultantCodeModel theoryMedical imagingoutputSoftwarePredictabilityLatent heat2 (number)Computer animation
Medical imagingComputer-assisted translationDatabase.NET FrameworkRight angleComputer animation
Mobile WebRight angle2 (number)Computer animation
Medical imagingoutputComputer-assisted translationDean numberComputer animation
Computer-generated imageryModel theory1 (number)Computer animation
Model theoryEndliche ModelltheorieLink (knot theory)Source codeGoogolComputer-generated imageryMobile WebPredictionLetterpress printingCloud computingGraph (mathematics)CodeShape (magazine)Floating pointoutputFunction (mathematics)CodeLine (geometry)Model theoryTensoroutputProgram slicingMultiplication signStandard deviationClique-widthOperator (mathematics)Social classFunctional (mathematics)DampingLibrary (computing)Endliche ModelltheorieMedical imagingResultantPredictabilityMaxima and minimaPixelFile formatRange (statistics)DataflowFunction (mathematics)SoftwareComputer configurationMereologyShape (magazine)Facebook.NET FrameworkCore dumpObject (grammar)CASE <Informatik>WebsiteMappingComputer animation
Social classMedical imagingCodePattern recognitionSource codeComputer-assisted translationComputer programmingFunction (mathematics)Computer animation
Shape (magazine)outputFunction (mathematics)CodePattern recognitionMappingCuboidMedical imagingShape (magazine)Point (geometry)Function (mathematics)MereologyoutputPixelComputer animation
Function (mathematics)Shape (magazine)outputMedical imagingSpacetimeoutputProgram slicingFunction (mathematics)Dimensional analysisMereologyDampingDistanceEuklidischer RaumComputer animation
SpacetimeVector spaceDistanceGoodness of fitRepresentation (politics)Nichtlineares GleichungssystemEndliche ModelltheorieComputer animation
Pattern recognitionWeb browserBitCodeEndliche ModelltheorieDataflowTensorComputer animation
Model theoryCartesian coordinate systemCodeBitFunctional (mathematics)
Digital photographyComputer animation
DistanceDatabaseComputer animation
Model theoryError messagePressureComputer-generated imageryEndliche ModelltheorieDifferent (Kate Ryan album)Medical imagingSoftware repositoryLibrary (computing)AlgorithmComputer animation
Point (geometry)Multiplication signEndliche ModelltheorieModel theoryData conversionCodeoutputResultantBitCartesian coordinate systemLine (geometry)Software frameworkMedical imagingComputer animation
Roundness (object)Multiplication signMusical ensembleLecture/Conference
Computer-generated imageryComputer animation
Transcript: English(auto-generated)
Cool, so it's 2.30, so a round of applause for Gilda and his talk about AI.
Hi, thank you all for coming. I'm very excited to talk to you today. I would like to thank Marty and Frances for getting such a big room. It's amazing, so please give them a big round of applause too. So I'm Gilda, I talked here last year and I'm coming
back this year. So I work at Le Boncoix in Paris and today I'm going to talk about AI search with Go and TensorFlow. So, spoiler, AI is really not about intelligence
at all, it's more about magic tricks, doing things that you wouldn't expect a computer to do. But nonetheless, it can do a lot of differencing. It can make your phone call for you, it can beat multiple pro gamers at StarCraft at the same time,
it can make up, invent some new, some false celebrities, or it can swap faces in a very realistic manner. So today I will show you how you could use this kind of state-of-the-art model into your Go application.
Alright, so the plan for today, first of all we'll review a bit some of the basics of AI and deep learning and machine learning, and we'll see how TensorFlow and Go work together. Then we'll see a first concrete example with image classification, and then we'll see how face recognition can work too.
And then we'll see how we can wrap this up to make an image search. And then this will be the conclusion. So, AI and TensorFlow. So it's a very good time for us developers regarding AI, because all the big players right now have a huge focus on AI. They are all competing to get as much traction
as possible into the AI product, and what the result is, is a lot of different frameworks that we can use and that we can do very cool things with. So Google released TensorFlow, which can be used with Keras, Facebook has PyTorch, Microsoft has the cognitive toolkit,
and Amazon developed MXNet, which is used by other companies too. And you can also find very easily some models online, so the same, Google, Facebook, Microsoft are giving away a lot of models that are ready to use for you. So let's see what are the basics of AI.
So it all starts with one of these little buddies. So this guy is a cell and he's getting some float as an input and releasing back a different float as an output. So the function usually looks like something like that, but we don't really need
to get into details. So that's the shape of the sigma function most of the time. But what is important is that these guys can combine with some other ones and start to make interesting things. So after a while, you can have some nice things happening. For example, from a non-obvious picture,
you can guess the breed of the dog. Maybe you can fail in some other example. It can turn Aris and Ford into Nicolas Cage, or it can protect you from non-safer work representations.
So I want just to clarify a few terms that will certainly pop up later. So architecture is the shape of a network, so that's a very important factor in a network.
And the next one, the one I'll use the most, is the model. So a model is basically an architecture with all the weights and bias defined. So here we see the function we've seen just before. And the pre-trained model is the kind of model that you can get from Facebook or Google
that already does a function well. And then a saved model is a format to export this model and to share it with some other people. So now TensorFlow is a framework for creating, training, predicting,
exporting, and importing neural networks. So it's a C++ core. Most of the time for the training part, it's using its Python binding to its C API. So using Python, you will create the network, you will train it,
and then you can export it to a saved model. And the part that interests us is, on the other side, once a model is trained, you can import it into Go, and you can run the prediction using the Go API. So in that talk, we'll really just focus on the Go part.
So now let's see how the code looks like. So here is all the code you'll need. This is all the specific TensorFlow code. So it's split in two different parts. So firstly, you load the model and you prepare the input
that you want to give into that network. And then the second part is actually running the session, giving the feeds and getting the fetches at the other part. So we can split this into three parts. First one is getting the model and loading it into TensorFlow.
The second part is building an input and then filling it into the input of the network. And then the last part is to fetch the result and to interpret it. So let's see a concrete example with image classification.
So image classification is basically taking an image and extracting some labels. So there was a beautiful cat, Siamese cat there,
with some scores about the fact that it's Siamese cat. All right. So one of the most common databases for this is ImageNet.
Well, let me check. Well, some of them are loaded, but not all of them. Well, just two seconds, I'll move on to my mobile thing then.
All right. Is it better? No, not really.
Yeah, yeah. OK. Well, anyway, so basically you have an image of a cat as an input and then you get some labels. So we'll need to add three more steps before actually running it into Go.
The first one is to find the model. The next is to run it into Python and then saving the model. And then the two other ones we know about.
Yeah, it's better. All right. So to find a model, there is a website I like a lot.
It's called Model Depot. There's not too many models there, but they are very well documented. Another good resource for this is the one from Google, Facebook, and Microsoft. Most of these models are on GitHub too, so you'll find a lot of interesting things.
And please don't be afraid to look at some research too. They often come with some pre-trained models. So I choose that one, which is a quite simple one and a quite light one. So now I have a model. I can download it from it. It's actually MIT licensed, so it's nice.
So we'll run it into Python. First of all, we need a few imports. So this is based on Keras, so it's mostly Keras imports. And now here is the Python code. So it's quite simple. A bit like in the Go code, the first part is loading the model, then we format the input, and then we run the prediction.
All right, so now we can run it into Python, and we can actually have the correct labels. So the next step is to save it into an export model. So we'll add a couple of more imports.
And basically the only thing you need to do is just to surround your code with a few more lines to connect your session to TensorFlow and then to export it into a saved model. So now we have a folder with everything we need to use our saved model.
So now one step that is usually simple is to find the input and output layer names. So here we're using Keras, so we can just print them straight out of the model.
It's one of the functions of the model. If you're not using Keras and if it's not documented, maybe you need to print all the operations and to just look at all the names and find which one looks the most promising one. Or the last solution is to debug the Python code.
So in our case, we're using the first option, and our input model is input underscore one, and the output is predictions of max. Now we need to format the input image. So here we can see the shape of the input layer.
So it's a 224 times 224 image with three channels as RGB. So there's just one more function we need to apply to these channels.
It's just a simple function. Actually, it's just a mapping between a smaller float range. So this is what we have, and we do it for every pixel. So this is how the Go code looks for that. So it's only using the standard library, the standard image library,
and then creating a TensorFlow tensor. All right, so now we have our tensor that we can feed into our network. Now the last part is to interpret the results. So the shape of the result is a slice of a thousand floats,
and this actually corresponds to the thousand classes there is in ImageNet. So it's basically all the kind of objects that we have, and they are linked to a score.
So usually what you'll do is to keep the 10 best results, maybe. So perfect, now we can get the output, and we can find out it's the same as cat, just the same way as we did with the Python code. So to wrap it up, so now you can give any image from any source you want from your Go program,
and we can get the 10 best labels. So face recognition. I won't go into the code for face recognition, but I just want to give you the basics of how it's working. So the first step is to detect the faces.
So it takes a picture of any size as an input, and it will give you back some boxes and scores about the detections. So in that example, we can extract five different faces. The next part is landmark extraction.
So the input shape is also a square image of 112 pixels wide, and the outputs are 68 landmark points. And it maps to some peculiar landmarks into the face,
and you can use it to straighten the face to improve the performance greatly. And then the last part is the descriptor extraction. So again, it takes an image as an input, and the output is a 128 size slice of float,
which actually represents a coordinate in a 128 dimension space. And what is good about that is that you can apply some Euclidean distance with it.
So the Euclidean distance is the most common distance, and so this defines some distance between faces. So the smaller the distance between the faces, the more likely it is that it's the same person.
And it's especially nice for us because it's a very lightweight representation. It's very fast, so it's good for search. The models I've used are from Face API JS, which is using TensorFlow JS.
It's a very nice one because it's only using TensorFlow as a dependency, and I really wanted to keep it only TensorFlow. Not using OpenCV or Dlib. So I had to do a bit of extra work on this one because the code was in JS, so I had to translate all the code from JS to Python,
which was quite simple because it's almost the same function. And then I was able to save the model and to load it into my Go application. So search is less of an exercise to the reader.
But it's actually quite simple now that we can label our photos. It's quite easy to just return the matching photos when the user is querying with a keyword. And about the faces. Once you have some distance between some of the faces in your database,
when a new face is found, you can just calculate the distance with all the other faces. And when you find a face that is close enough, then you can tell it's the same people.
All right, so the conclusion. First of all, I want to point you out to my repo, where you can find all the models I've talked about. Yeah, you can find it quite easily, I think.
What I want you to do is just to try it. I made a nice Docker image that can run, just to try out to see the performance of this algorithm. Or you can use it as a library. It's hopefully simple and ready to use,
but please feel free to contact me if it's not working or if you would like something different. And also, I would like to encourage you to try new models. So as we've seen, the TensorFlow and Keras models are very easy to integrate into your applications.
And yeah, you should try. It's nice. Other models, other frameworks will most likely require some conversion, which is still experimental, and it will result in some significant extra work.
And so, yeah, remember the five steps to use a new model, finding the model. Finding the model is just searching on Google, so that's the kind of thing that you're already doing. Running into Python usually is not a problem because most of these models are well documented.
Saving the model is just a few lines that's simple. Maybe most of the time you'll have some troubles formatting the input. If the input is an image, it's almost always the same kind of formatting, so it shouldn't be too much of a problem.
But if you start to play with sound or this kind of thing, maybe you'll spend some time, because there is not much Go code doing that. And interpreting the result can be very simple for some time and a bit harder at some other points. But the last result is always to just run the Python code step by step
and to see how it's done in Python. So, it's not impossible to do. You should definitely try it. All right, thank you. That was all I had to show you today.
Do you have some questions? Yeah, we have some time for Q&A, so if you have questions, raise your hand. And if you want to leave, please do so silently. Thank you. Any questions?
Okay, so just a round of applause. Thank you very much.