AI image search with Go & Tensorflow
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 561 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/44135 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSDEM 2019193 / 561
1
9
10
15
18
19
23
24
27
29
31
33
34
35
38
39
40
43
47
49
52
53
54
55
58
59
60
63
65
67
69
70
78
80
82
87
93
95
97
102
103
104
107
110
111
114
116
118
120
122
123
126
127
131
133
136
137
139
141
142
148
153
155
157
159
163
164
168
169
170
171
172
173
174
181
183
185
187
188
193
196
197
198
199
200
201
205
207
208
209
211
213
214
218
221
223
224
226
230
232
234
235
236
244
248
250
251
252
253
255
256
257
262
263
264
268
269
271
274
275
276
278
280
281
283
284
288
289
290
293
294
296
297
300
301
304
309
311
312
313
314
315
317
318
321
322
327
332
333
334
335
336
337
338
339
340
343
345
346
352
353
355
356
357
359
360
362
369
370
373
374
375
376
377
378
383
384
387
388
389
390
391
393
394
395
396
406
408
409
412
413
414
415
419
420
425
426
431
432
433
434
435
436
438
439
440
441
445
446
447
448
453
455
457
459
466
467
471
473
474
475
476
479
480
484
485
486
489
491
492
496
499
500
502
505
507
508
512
515
517
518
529
531
533
534
535
536
539
540
546
550
551
552
553
554
555
557
558
559
560
561
00:00
Computer-generated imageryDisintegrationMobile appRoundness (object)Computer animationLecture/Conference
00:51
ComputerDifferent (Kate Ryan album)Model theoryMultiplicationSystem callMultiplication signNeuroinformatikBeat (acoustics)Cartesian coordinate systemDifferenz <Mathematik>Computer animation
01:40
PlanningMultiplication signMedical imagingSoftware developerBitPattern recognitionVirtual machineMachine learningForestComputer animation
02:12
FeedbackProduct (business)Focus (optics)ResultantDifferent (Kate Ryan album)Computer animation
02:29
Service (economics)Software frameworkEndliche ModelltheorieEndliche ModelltheorieTouch typingSoftware framework.NET FrameworkDemosceneFacebookMixed realityCognitionComputer animation
03:00
Computer networkArtificial neural networkoutputCellular automatonFunction (mathematics)TeilerfunktionShape (magazine)FluidFunctional (mathematics)Multiplication signDamping1 (number)Computer animation
03:29
Coefficient of determination
03:49
Representation (politics)
04:06
ArchitectureModel theoryModel theoryFunctional (mathematics)DivisorFile formatComputer architectureSoftwareWeightTerm (mathematics)Shape (magazine)FacebookWave packetState of matterComputer animation
04:56
Software frameworkComputer networkNumeral (linguistics)Keyboard shortcutCore dumpPredictionoutputFunction (mathematics)TensorModel theoryMilitary operationGraph (mathematics)Module (mathematics)Artificial neural networkWave packetCore dumpSoftware frameworkMultiplication signMereologyResultantCodeModel theoryMedical imagingoutputSoftwarePredictabilityLatent heat2 (number)Computer animation
06:42
Medical imagingComputer-assisted translationDatabase.NET FrameworkRight angleComputer animation
07:27
Mobile WebRight angle2 (number)Computer animation
07:50
Medical imagingoutputComputer-assisted translationDean numberComputer animation
08:16
Computer-generated imageryModel theory1 (number)Computer animation
08:41
Model theoryEndliche ModelltheorieLink (knot theory)Source codeGoogolComputer-generated imageryMobile WebPredictionLetterpress printingCloud computingGraph (mathematics)CodeShape (magazine)Floating pointoutputFunction (mathematics)CodeLine (geometry)Model theoryTensoroutputProgram slicingMultiplication signStandard deviationClique-widthOperator (mathematics)Social classFunctional (mathematics)DampingLibrary (computing)Endliche ModelltheorieMedical imagingResultantPredictabilityMaxima and minimaPixelFile formatRange (statistics)DataflowFunction (mathematics)SoftwareComputer configurationMereologyShape (magazine)Facebook.NET FrameworkCore dumpObject (grammar)CASE <Informatik>WebsiteMappingComputer animation
13:04
Social classMedical imagingCodePattern recognitionSource codeComputer-assisted translationComputer programmingFunction (mathematics)Computer animation
13:31
Shape (magazine)outputFunction (mathematics)CodePattern recognitionMappingCuboidMedical imagingShape (magazine)Point (geometry)Function (mathematics)MereologyoutputPixelComputer animation
14:30
Function (mathematics)Shape (magazine)outputMedical imagingSpacetimeoutputProgram slicingFunction (mathematics)Dimensional analysisMereologyDampingDistanceEuklidischer RaumComputer animation
14:59
SpacetimeVector spaceDistanceGoodness of fitRepresentation (politics)Nichtlineares GleichungssystemEndliche ModelltheorieComputer animation
15:32
Pattern recognitionWeb browserBitCodeEndliche ModelltheorieDataflowTensorComputer animation
15:53
Model theoryCartesian coordinate systemCodeBitFunctional (mathematics)
16:13
Digital photographyComputer animation
16:38
DistanceDatabaseComputer animation
17:00
Model theoryError messagePressureComputer-generated imageryEndliche ModelltheorieDifferent (Kate Ryan album)Medical imagingSoftware repositoryLibrary (computing)AlgorithmComputer animation
17:50
Point (geometry)Multiplication signEndliche ModelltheorieModel theoryData conversionCodeoutputResultantBitCartesian coordinate systemLine (geometry)Software frameworkMedical imagingComputer animation
19:33
Roundness (object)Multiplication signMusical ensembleLecture/Conference
20:05
Computer-generated imageryComputer animation
Transcript: English(auto-generated)
00:14
Cool, so it's 2.30, so a round of applause for Gilda and his talk about AI.
00:27
Hi, thank you all for coming. I'm very excited to talk to you today. I would like to thank Marty and Frances for getting such a big room. It's amazing, so please give them a big round of applause too. So I'm Gilda, I talked here last year and I'm coming
00:50
back this year. So I work at Le Boncoix in Paris and today I'm going to talk about AI search with Go and TensorFlow. So, spoiler, AI is really not about intelligence
01:03
at all, it's more about magic tricks, doing things that you wouldn't expect a computer to do. But nonetheless, it can do a lot of differencing. It can make your phone call for you, it can beat multiple pro gamers at StarCraft at the same time,
01:22
it can make up, invent some new, some false celebrities, or it can swap faces in a very realistic manner. So today I will show you how you could use this kind of state-of-the-art model into your Go application.
01:40
Alright, so the plan for today, first of all we'll review a bit some of the basics of AI and deep learning and machine learning, and we'll see how TensorFlow and Go work together. Then we'll see a first concrete example with image classification, and then we'll see how face recognition can work too.
02:01
And then we'll see how we can wrap this up to make an image search. And then this will be the conclusion. So, AI and TensorFlow. So it's a very good time for us developers regarding AI, because all the big players right now have a huge focus on AI. They are all competing to get as much traction
02:24
as possible into the AI product, and what the result is, is a lot of different frameworks that we can use and that we can do very cool things with. So Google released TensorFlow, which can be used with Keras, Facebook has PyTorch, Microsoft has the cognitive toolkit,
02:42
and Amazon developed MXNet, which is used by other companies too. And you can also find very easily some models online, so the same, Google, Facebook, Microsoft are giving away a lot of models that are ready to use for you. So let's see what are the basics of AI.
03:03
So it all starts with one of these little buddies. So this guy is a cell and he's getting some float as an input and releasing back a different float as an output. So the function usually looks like something like that, but we don't really need
03:22
to get into details. So that's the shape of the sigma function most of the time. But what is important is that these guys can combine with some other ones and start to make interesting things. So after a while, you can have some nice things happening. For example, from a non-obvious picture,
03:45
you can guess the breed of the dog. Maybe you can fail in some other example. It can turn Aris and Ford into Nicolas Cage, or it can protect you from non-safer work representations.
04:06
So I want just to clarify a few terms that will certainly pop up later. So architecture is the shape of a network, so that's a very important factor in a network.
04:21
And the next one, the one I'll use the most, is the model. So a model is basically an architecture with all the weights and bias defined. So here we see the function we've seen just before. And the pre-trained model is the kind of model that you can get from Facebook or Google
04:43
that already does a function well. And then a saved model is a format to export this model and to share it with some other people. So now TensorFlow is a framework for creating, training, predicting,
05:02
exporting, and importing neural networks. So it's a C++ core. Most of the time for the training part, it's using its Python binding to its C API. So using Python, you will create the network, you will train it,
05:25
and then you can export it to a saved model. And the part that interests us is, on the other side, once a model is trained, you can import it into Go, and you can run the prediction using the Go API. So in that talk, we'll really just focus on the Go part.
05:46
So now let's see how the code looks like. So here is all the code you'll need. This is all the specific TensorFlow code. So it's split in two different parts. So firstly, you load the model and you prepare the input
06:03
that you want to give into that network. And then the second part is actually running the session, giving the feeds and getting the fetches at the other part. So we can split this into three parts. First one is getting the model and loading it into TensorFlow.
06:22
The second part is building an input and then filling it into the input of the network. And then the last part is to fetch the result and to interpret it. So let's see a concrete example with image classification.
06:44
So image classification is basically taking an image and extracting some labels. So there was a beautiful cat, Siamese cat there,
07:02
with some scores about the fact that it's Siamese cat. All right. So one of the most common databases for this is ImageNet.
07:27
Well, let me check. Well, some of them are loaded, but not all of them. Well, just two seconds, I'll move on to my mobile thing then.
07:43
All right. Is it better? No, not really.
08:05
Yeah, yeah. OK. Well, anyway, so basically you have an image of a cat as an input and then you get some labels. So we'll need to add three more steps before actually running it into Go.
08:28
The first one is to find the model. The next is to run it into Python and then saving the model. And then the two other ones we know about.
08:57
Yeah, it's better. All right. So to find a model, there is a website I like a lot.
09:03
It's called Model Depot. There's not too many models there, but they are very well documented. Another good resource for this is the one from Google, Facebook, and Microsoft. Most of these models are on GitHub too, so you'll find a lot of interesting things.
09:24
And please don't be afraid to look at some research too. They often come with some pre-trained models. So I choose that one, which is a quite simple one and a quite light one. So now I have a model. I can download it from it. It's actually MIT licensed, so it's nice.
09:44
So we'll run it into Python. First of all, we need a few imports. So this is based on Keras, so it's mostly Keras imports. And now here is the Python code. So it's quite simple. A bit like in the Go code, the first part is loading the model, then we format the input, and then we run the prediction.
10:04
All right, so now we can run it into Python, and we can actually have the correct labels. So the next step is to save it into an export model. So we'll add a couple of more imports.
10:21
And basically the only thing you need to do is just to surround your code with a few more lines to connect your session to TensorFlow and then to export it into a saved model. So now we have a folder with everything we need to use our saved model.
10:43
So now one step that is usually simple is to find the input and output layer names. So here we're using Keras, so we can just print them straight out of the model.
11:00
It's one of the functions of the model. If you're not using Keras and if it's not documented, maybe you need to print all the operations and to just look at all the names and find which one looks the most promising one. Or the last solution is to debug the Python code.
11:22
So in our case, we're using the first option, and our input model is input underscore one, and the output is predictions of max. Now we need to format the input image. So here we can see the shape of the input layer.
11:42
So it's a 224 times 224 image with three channels as RGB. So there's just one more function we need to apply to these channels.
12:00
It's just a simple function. Actually, it's just a mapping between a smaller float range. So this is what we have, and we do it for every pixel. So this is how the Go code looks for that. So it's only using the standard library, the standard image library,
12:25
and then creating a TensorFlow tensor. All right, so now we have our tensor that we can feed into our network. Now the last part is to interpret the results. So the shape of the result is a slice of a thousand floats,
12:43
and this actually corresponds to the thousand classes there is in ImageNet. So it's basically all the kind of objects that we have, and they are linked to a score.
13:00
So usually what you'll do is to keep the 10 best results, maybe. So perfect, now we can get the output, and we can find out it's the same as cat, just the same way as we did with the Python code. So to wrap it up, so now you can give any image from any source you want from your Go program,
13:24
and we can get the 10 best labels. So face recognition. I won't go into the code for face recognition, but I just want to give you the basics of how it's working. So the first step is to detect the faces.
13:43
So it takes a picture of any size as an input, and it will give you back some boxes and scores about the detections. So in that example, we can extract five different faces. The next part is landmark extraction.
14:01
So the input shape is also a square image of 112 pixels wide, and the outputs are 68 landmark points. And it maps to some peculiar landmarks into the face,
14:20
and you can use it to straighten the face to improve the performance greatly. And then the last part is the descriptor extraction. So again, it takes an image as an input, and the output is a 128 size slice of float,
14:49
which actually represents a coordinate in a 128 dimension space. And what is good about that is that you can apply some Euclidean distance with it.
15:04
So the Euclidean distance is the most common distance, and so this defines some distance between faces. So the smaller the distance between the faces, the more likely it is that it's the same person.
15:23
And it's especially nice for us because it's a very lightweight representation. It's very fast, so it's good for search. The models I've used are from Face API JS, which is using TensorFlow JS.
15:40
It's a very nice one because it's only using TensorFlow as a dependency, and I really wanted to keep it only TensorFlow. Not using OpenCV or Dlib. So I had to do a bit of extra work on this one because the code was in JS, so I had to translate all the code from JS to Python,
16:01
which was quite simple because it's almost the same function. And then I was able to save the model and to load it into my Go application. So search is less of an exercise to the reader.
16:21
But it's actually quite simple now that we can label our photos. It's quite easy to just return the matching photos when the user is querying with a keyword. And about the faces. Once you have some distance between some of the faces in your database,
16:44
when a new face is found, you can just calculate the distance with all the other faces. And when you find a face that is close enough, then you can tell it's the same people.
17:01
All right, so the conclusion. First of all, I want to point you out to my repo, where you can find all the models I've talked about. Yeah, you can find it quite easily, I think.
17:21
What I want you to do is just to try it. I made a nice Docker image that can run, just to try out to see the performance of this algorithm. Or you can use it as a library. It's hopefully simple and ready to use,
17:41
but please feel free to contact me if it's not working or if you would like something different. And also, I would like to encourage you to try new models. So as we've seen, the TensorFlow and Keras models are very easy to integrate into your applications.
18:04
And yeah, you should try. It's nice. Other models, other frameworks will most likely require some conversion, which is still experimental, and it will result in some significant extra work.
18:23
And so, yeah, remember the five steps to use a new model, finding the model. Finding the model is just searching on Google, so that's the kind of thing that you're already doing. Running into Python usually is not a problem because most of these models are well documented.
18:43
Saving the model is just a few lines that's simple. Maybe most of the time you'll have some troubles formatting the input. If the input is an image, it's almost always the same kind of formatting, so it shouldn't be too much of a problem.
19:03
But if you start to play with sound or this kind of thing, maybe you'll spend some time, because there is not much Go code doing that. And interpreting the result can be very simple for some time and a bit harder at some other points. But the last result is always to just run the Python code step by step
19:26
and to see how it's done in Python. So, it's not impossible to do. You should definitely try it. All right, thank you. That was all I had to show you today.
19:45
Do you have some questions? Yeah, we have some time for Q&A, so if you have questions, raise your hand. And if you want to leave, please do so silently. Thank you. Any questions?
20:01
Okay, so just a round of applause. Thank you very much.