Building smarter solutions with no expertise in machine learning
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 130 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/50093 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
EuroPython 2020116 / 130
2
4
7
8
13
16
21
23
25
26
27
30
33
36
39
46
50
53
54
56
60
61
62
65
68
73
82
85
86
95
100
101
102
106
108
109
113
118
119
120
125
00:00
Machine learningBuildingLaurent seriesPoint cloudSoftware developerGoogolE-bookTouchscreenGoogolCloud computingPoint cloudE-bookWindowMeeting/InterviewComputer animation
00:58
Machine learningVirtual machineBit
01:37
Machine learningInformationAlgorithmAndroid (robot)Mobile appSpeech synthesisTranslation (relic)YouTubeExponential functionGoogolSoftware developerMathematical modelVideoconferencingMachine visionComputer-generated imageryMachine learningDigital photographyExpert systemMatching (graph theory)Computer scienceResultantMultiplication signCloud computingoutputStudent's t-testRing (mathematics)Machine visionContent (media)NumberLaptopVirtual machineProjective planeInstance (computer science)Type theoryInformationGoogolElementary arithmeticVideoconferencingSpeech synthesisMathematicsPhysicalismMathematical modelMathematical modelAndroid (robot)Moore's lawAlgorithmArtificial neural networkRight angleAsynchronous Transfer ModeMereologyField (computer science)Personal computerRevision controlTuring testNeuroinformatikComputer animation
07:32
Maxima and minimaConvex hullRing (mathematics)Acoustic shadowPlastikkarteComputer-generated imageryMatching (graph theory)Machine visionClient (computing)Operations support systemDependent and independent variablesPoint cloudProgrammable read-only memoryVideoconferencingVideoconferencingMultiplication signMedical imagingObject (grammar)ResultantCuboidLimit (category theory)Optical character recognitionWeb pageMathematical modelNatural languageSystem callVirtual machineOpen sourceCasting (performing arts)Greatest elementRight angleClient (computing)Line (geometry)Disk read-and-write headLibrary (computing)Wrapper (data mining)WordMachine visionStreaming mediaAcoustic shadowPerspective (visual)Natural numberUniform resource locatorMachine learningPosition operatorBlock (periodic table)Row (database)Confidence intervalDimensional analysisData storage deviceInstance (computer science)Web 2.0BitContent (media)Graph coloringWeightRing (mathematics)Symbol tableProcess (computing)Sound effectComputer animation
13:27
VideoconferencingInterior (topology)Military operationClient (computing)Operations support systemGoogolPoint cloudType theoryContext awarenessCodeFormal languageNatural numberRing (mathematics)Mathematical analysisContent (media)Sign (mathematics)Negative numberNatural languageDifferent (Kate Ryan album)Computer scienceMathematical analysisVideoconferencingCodeSampling (statistics)Object (grammar)Field (computer science)Context awarenessNeuroinformatikContent (media)Real-time operating systemConfidence intervalProcess (computing)Frame problemEmailVirtual machineBitCASE <Informatik>ResultantTwitterMathematical modelProduct (business)InformationWeb 2.0Client (computing)Multiplication signInstance (computer science)Type theorySocial classLemma (mathematics)Video trackingoutputSubject indexingMachine learningComputer animation
18:57
Formal languagePoint cloudType theoryClient (computing)Dependent and independent variablesCodeOrder of magnitudeOperations support systemTranslation (relic)Sample (statistics)Source codeSpeech synthesisTimestampGoogolLogic synthesisTranslation (relic)BitBuildingProfil (magazine)Virtual machineMultiplication signMachine visionWordSubject indexingDifferent (Kate Ryan album)Mathematical modelLink (knot theory)Formal languageMachine learningE-Book-ReaderAudio file formatSpeech synthesisCombinational logicPosition operatorClient (computing)WritingResultantRow (database)E-bookLine (geometry)Android (robot)Sinc functionPersonal digital assistantRight angleRoboticsDampingGoogolContent (media)Spherical capInstance (computer science)Slide ruleoutputImage resolutionUniverse (mathematics)Computer animation
24:27
Programmable read-only memoryAngleGoogolInformation privacyCodeOperations support systemClient (computing)Type theoryGeneric programmingMachine visionMathematical modelMobile WebWeb browserPoint cloudGoogolLatent heatFormal languageSpeech synthesisMathematical modelClient (computing)BitSet (mathematics)CodecDifferent (Kate Ryan album)ResultantReal-time operating systemWeb browserSpherePoint cloudEmailInformationWebsiteMultiplication signRevision controlComputer configurationComputer fileSmartphoneService (economics)Shape (magazine)Instance (computer science)Degree (graph theory)Mathematical modelWave packetVirtual machineIterationCASE <Informatik>Machine visionParameter (computer programming)Internet service providerElectric generatorVideoconferencingMachine learningWordRight angleWaveReal numberSource codeXML
30:32
Mathematical modelMobile WebWeb browserPoint cloudComputing platformCorrelation and dependenceGoogolSoftware testingPressureStructured programmingMathematical analysisSocial classVideoconferencingComputer-generated imageryRegular expressionProcess (computing)ArchitectureDemo (music)Wave packetDemo (music)Matrix (mathematics)Machine visionSmartphoneMathematical analysisFunctional (mathematics)EmailResultantWeb browserData storage deviceNatural languageVideo trackingSet (mathematics)Android (robot)MereologyVideoconferencingNeuroinformatikTranslation (relic)YouTubeInstance (computer science)Object (grammar)Product (business)Different (Kate Ryan album)Field (computer science)NumberInformationLine (geometry)System administratorMathematical modelProjective planeConnected spaceInternetworkingMathematical modelMultiplication signComputer architectureVirtual machineSampling (statistics)TouchscreenFactory (trading post)CausalityCartesian coordinate systemCumulantChemical equationFront and back endsPredictabilityMobile WebMetric systemComputer animation
36:32
ArchitectureDemo (music)CodeAsynchronous Transfer ModeFeedbackSample (statistics)WebcamQR codeSmartphoneWeb pageAuthorizationBitUniform resource locatorWeb browserComputer animation
37:32
Generic programmingProcess (computing)Machine visionWebcamGeneric programmingMathematical modelResultantDemo (music)SoftwareBitWebsitePoint cloudGodFeedbackMeeting/InterviewComputer animation
38:55
Generic programmingRegular expressionVideoconferencingCodeAsynchronous Transfer ModeFeedbackSample (statistics)Demo (music)Confidence intervalPosition operatorMereologyMultiplication signEmailMeeting/InterviewComputer animation
39:45
Regular expressionConfidence intervalMobile WebMultiplication signMathematical modelDifferent (Kate Ryan album)Graph coloringFlagEmailMeeting/InterviewComputer animation
40:21
Regular expressionConfidence intervalMathematical modelNormal (geometry)EmailObject (grammar)outputUniform resource locatorCheat <Computerspiel>Goodness of fit
42:06
Performance appraisalMathematical modelNegative numberComplete metric spaceHeat transferComputer networkMachine learningSoftware repositoryMenu (computing)Software developerGoogolSlide ruleLaurent seriesFeedbackPosition operatorSlide ruleCASE <Informatik>Expert systemBitNumberEmailMathematical modelVirtual machineResultantNegative numberGoogolSearch engine (computing)Set (mathematics)Open sourceSoftware frameworkTerm (mathematics)Metric systemComputer animation
44:24
Slide ruleNumberMultiplication signLink (knot theory)Online chatMeeting/Interview
Transcript: English(auto-generated)
00:06
Okay, Laurent Piccard is from Google and yeah, let's take it away. So can you start your screen sharing? Yes, all done.
00:21
So hello everyone, thanks for having me today. I'm going to hide my own window, okay. Quick introduction, so my name is Laurent Piccard, as you can tell, I'm French. I'm actually based in Paris and my background,
00:41
I am an e-book pioneer, so I've been working in the e-book industry for 17 years, 20 years ago, and for three years now, I'm focusing on cloud technologies with Google Cloud, okay. Unfortunately, I cannot see you and ask you questions.
01:02
I'd very much like to start with this quote from our colleague, because it really shows the feeling I have whenever there's something new done with machine learning. And still after a couple of years, I feel magic, honestly. But this is just technology and you scratch a little bit,
01:22
this is just technology and we can all understand what's behind it or have a pretty good idea. And my goal today is to maybe scratch a little bit behind some stuff you haven't seen. I have my own definition of machine learning, that's a weird one, but for me, machine learning is solving solutions where you have data, right.
01:45
You have data and you want to understand what's in your data. You want to extract information out of your data. So this is my personal definition, but it's an incorrect one. The real definition is that machine learning is a part of AI
02:01
and within machine learning, you have deep learning. So most of the stuff I'm going to show you today is actually deep learning, but for the sake of simplicity, I will be mentioning machine learning most of the time. So how does deep learning work? So first, the experts started to work on the field 40 years ago.
02:25
Last year, they actually got the Alan Turing Prize Award for that. It's like the Nobel Prize for computer science. And they thought at the time, okay, let's try to mimic the way we think our brain works with neural networks.
02:41
For that, they needed many examples. And the magic here is that they managed to solve problems. And we don't know exactly why or we don't have the answer, the systemic answer to solve these problems, but machine learning is now solving these problems where we couldn't solve them before.
03:01
Why does it work today? So first of all, we are inheriting from centuries of science and in particular, algorithms, a lot coming from mathematics and physics. For a couple of decades, we now have everything we need for big data.
03:23
We are able to store data. We are able to consult a lot of data now thanks to computers. And for most now, for let's say a few years or one decade, now technology and especially cloud technologies give us the computing power to do everything.
03:41
Of course, personal computers, laptops now have an amazing computing power, but cloud technologies now allow you to go to the next step and do stuff in hours or days where it would take weeks before. To give you an idea, so I'm going to talk generally about machine learning possibilities,
04:03
but to give you an idea about how much important that is at Google. So those are the numbers of projects. So it's a couple of years back which have a machine learning problem, sorry, a machine learning model in their projects. And you've seen some of them as results.
04:21
So for instance, in Gmail, when you start to type a sentence, you have a suggestion to end a sentence. In Android, in the late version of Android, there is a local customized machine learning model learning from your habits and optimizing the battery life.
04:43
And in Google Photos, maybe you've tried that. If you say, okay, this is my kid on one picture, it will find a match of your kid on all other pictures, but even 10 years back too. So it's very amazing technology.
05:02
There are three ways today that you can benefit from machine learning. Of course, if you are an expert, then you know a lot about it. You are dealing with neural networks, and I hope you will learn a few things of interest for you in this talk. But if you're spending most of your time developing solutions,
05:23
then maybe you don't have the expertise to deal with machine learning, but it doesn't matter. You can maybe use existing machine learning models. They are available through APIs. They are ready to use models. And in between now, since a couple of years, there are auto-email techniques.
05:43
So it's filling a big gap. You still don't need expertise, but you can automatically build customized models for your own needs. And the purpose of this talk today is to give you a quick overview of everything you can do with these two types of technologies.
06:03
So first, the machine learning APIs. So if you remember my own definition of machine learning, it's solving solutions from data, and data here can be text, pictures, videos, or speech. Then you need models, and from that, you can extract information,
06:21
and sometimes, the result you want is your input transformed, transcribed into something else. Now, let me start with the vision model. So I really love this kind of model because in the 90s, I was a student, so we were not talking about machine learning at the time,
06:42
but I was trying with other students to solve the problem of understanding what's in a picture, understanding the content of a picture to automatically detect stuff. And at the time, we were just trying to detect edges, and it just failed miserably because we could do it on a few pictures,
07:02
and then as soon as we would bring something new, then it would fail, it would not work anymore. Machine learning is the solution now. Provide a picture to a machine learning model, a vision model. First of all, it's able to give you labels to describe you the picture,
07:20
what's in the picture in general. So here, this is a picture about orbiton. So orbiton is the place in New Zealand where the Lord of the Rings movies were shot, and this picture, so this is on the right, the JSON stream that I get from the API, and it tells me that at 95% of confidence,
07:43
it's about nature and so on, so that's correct. More precisely, so if I take the same picture, but this time, I zoomed in a little bit, I flipped it and cropped it, then a vision model is also able to match this picture with an existing one, a public one,
08:02
on the web, and here, it's able to tell me that most likely, this picture is about this place, and I even get the GPS location for it. More precisely, here must be a picture of the cast in a restaurant still in New Zealand. It can try to detect entities, so it's called object detection.
08:23
So detect entities, but precisely, with a bounding box in pictures. And here, the results I get are that there are many persons still, so this one is a person, right, but there are pants here and even tops. So it can be very precise.
08:42
Even more precise, it can detect faces, in general, faces. So here, it's a fitted rendition, and what I get is the crop box for the face, a large one or a close one, but I get also the location of the different features, like the eyes, the nose, the mouth, and so on.
09:01
I get the position of the head in three dimensions, and also, a vision model can be taught to detect emotions. So here, there are a few generic emotions, and what it detects is that likely, this face is angry, and this is Golomb. Golomb is always angry, right?
09:23
Let's move on. So now, still on vision, optical character recognition, so OCR. This is a problem that is now fully solved thanks to machine learning. If I take this screenshot, the vision model is able to tell me that there are three main blocks,
09:42
and then inside them, there are sentences or lines or rows, if you prefer, and then words and then symbols. Here, it doesn't make any mistakes. It's really perfect. So it's a solved problem. Even if I apply some perspective effect,
10:01
so if you take a picture on a table or on a wall and so on, it still works really great. So let's say it's a solved problem, but the next step now for OCR is actually handwriting detection, and it starts to work really great already. So it's the same principle. So here, this is a handwriting from Tolkien,
10:23
and so it's not perfect. It is not as good as for typewriting, but here, it's detecting a lot of the rings. So ideally, it would detect the first one here and the second one here, but then it works pretty well, and it's just making one big mistake here, shadows.
10:44
So it's detecting a V instead of the Ws, something that could be maybe autocorrected with natural language processing, but it does very little mistakes here. The bottom of the F is detected as something else.
11:05
So it's almost perfect. It's really, really good. The limit of that, of course, is if we are not able ourselves to read back something that is handwritten, then a machine learning model won't be either, and the limit might be doctor prescriptions, right?
11:25
Sometimes they are not even able themselves to read them back. Okay, so and also, it's able to detect entities and to match them up with something close to it
11:41
found on the web. So on this example, I took a picture from a Spanish newspaper that I had never seen before. So it's a very rare picture of Tolkien. Once again, I zoomed in, I cropped the picture, changed the colors, so there's not any single pixel in common with the original one,
12:01
but yet the visual model is able to detect this picture to tell me that it's coming from this Spanish newspaper, but more than that, it's able to match with the text on this web page and tell me that most likely, this picture is about Tolkien, and what I get here, so GR Tolkien,
12:22
I get an entity ID. So this ID lets me work with a single ID and I will deal with Tolkien this way wherever I'm working with these APIs, okay? Can be used, so just a few lines.
12:41
So this is a Python client library that is available as open source on GitHub, and it's a wrapper around the API. So what you have to do is just always create a client, provide a content, so an image here, so I have two pictures, call the feature you're interested in, so face detection, for instance,
13:02
and then you have the results right away and you can deal with the results, okay? So we've seen what you can do with pictures as of today. You can extrapolate to imagine what you can do with videos because videos are pictures with a time dimension, right?
13:22
So maybe the easiest is to show you an example. If you can understand what is in a video, then it means you can index it. And so this video has gone through the video intelligence model, and I get labels and it tells me what's in the video and where.
13:43
So here at the beginning, I have a spiral galaxy The world is made with A bit later, I have humans You learn to go And in so doing Here, I have a polar bear
14:03
So you see You'll fix something You get the results, and then you can really understand what you have in your data, in your input data. So let's move on. Just one code sample. So if you're interested to check out how it's done, I have written a tutorial.
14:22
So it's a code lab here. And actually, to generate this or to get this information that there is an insect here in a larger video. Once again, it's always the same principle. You create a client, you indicate that you're interested in object tracking, and you call annotate video,
14:42
and then you get the results. If your video is a couple of... If your video's duration is a couple of minutes, then after one minute, about, you will have the results. So it's, of course, not real time because it's a long processing. It's a harder processing to read all the frames
15:01
from the video and understand what's in it. But you can actually track objects. So it's even better than on pictures. You can follow the objects in your videos. So next, text, text. So it's a very big field in computer science. It's called NLP, Natural Language Processing. I guess we all learned that
15:21
if we went to a computer school. It's a really big field and latest advancements came from machine learning again. So you provide text and the natural language model is able to analyze the text and give you results. So on this sentence, it will tell me first that it's in English, okay?
15:44
It's able to give me the precise syntax of the sentence with all the different relationships. On tuition is detected, lemmas, I know that was, relates to the verb to be and so on. Like in pictures, it's able to detect entities.
16:01
And here I have three different classes, three different types of entities. In red, I have persons. Tolkien is a person. And by the way, if you notice here, I have an ID and it's exactly the same ID than that for the picture before. So I can really deal with Tolkien here on text
16:24
and on pictures or videos. And also one cool thing, the natural language model understands the context. So here, if Tolkien was actually not G.R. Tolkien but Christopher Tolkien, the song, then I will get Christopher Tolkien
16:43
with a different ID, of course, the unique ID for the song. Then British here relates to the unification and the three books here, here, here are each detected as works of art, which is perfectly correct.
17:02
Okay, you can also ask for classification. Okay, I have a book, I have a chapter, I have a paragraph, I have a sentence, you can ask for content classification and in this case tells me that this sentence should be classified under books and literature
17:21
with a confidence of 97% which is perfect. And finally, like in pictures, you can try to get a sentiment analysis, try to understand whether we're talking positively or negatively. In the text you provide.
17:40
So to try that out, what I did is, I retrieved two articles, two reviews about the Hobbit, one from the New York Times, last century, the last century and one from Goodreads, it's a social bootnet quote. The first one is very positive,
18:01
the second one, as you can tell, is very negative. And the results I get are, for instance, for each sentence, I get a score between minus one and plus one and it does work. These sentences come from the New York Times. This one too, it's a neutral one. Most of the sentences, of course, are neutral.
18:23
These sentences come from Pauline's review who really hated the book. So some companies, for instance, are using that to understand how people, our users are talking about their products on Twitter or on the web and so on. So they are actually parsing, retrieving content
18:40
and using the natural language sentiment analysis for that. So some companies are using that on emails, on all the emails they receive to understand how happy or unhappy their customers are, can be pretty useful. Again, to use that in Python,
19:02
you create a client, you provide the content, the document, it can be text or HTML. And you call analyze sentiment and then you have the result very, very quickly. In the same vein, translation. So I won't get into details or I can share something with you.
19:22
So in 2016, I was still working on eBooks and I was using Google Translate and someday something happened. The results were a lot better. And what happened actually, I got the answer since then, is that historically Google Translate
19:42
was using a machine learning, a phrase-based model. So mostly a statistical model. And in 2016, Google Translate switched to a pure machine learning model. And this is why at the time
20:01
and since then it just kept improving, we suddenly got a big bump in quality. Okay, so here I just need two lines to use it. I create a client I call translator and I have a translation right away. It works from and to over one Android languages.
20:21
So that's thousands of different combinations. And finally, regarding machine learning API speech. So speech as an input, you talk and you get your speech transcribed into text. So this is also a problem that is now solved thanks to machine learning.
20:42
If you're able to understand the speech that is in your data, then it means you can index it. So for instance, if I have a new audio file then I can get the position of every word in all my sentences. And to use it also very easy,
21:00
you create a client you call recognize. Sorry, yeah, you call recognize and then you have the text coming from your audio. So this is again, another tutorial I've written. You will find all the slides, they are public. So you will get the link at the end. It's also on my profile on your Python
21:22
if you wanna try that. So what I tried in this one, I recorded myself speaking French poetry aloud a very famous one from La Fontaine and just asked for, so I'm helping here a little bit telling that I know beforehand that it's French.
21:41
Asking for automatic punctuation or this is a new feature that is very, very important. It will give you the caps, it will give you commas and so on. And here I'm also asking for the word time offsets so that I can index my different words, okay. Now the opposite, text to speech.
22:04
You provide text and then you get speech out of that. 20 years ago, I used a text to speech engine in the first European ebook reader we made. It was a big failure. So I did work quite a few weeks on it.
22:22
I was very proud of the result, but the result at the time was that you pressed the play button to get the book to be read aloud and the result you got was Alice in Wonderland and so on, I am a robot talking to you. So now this is finished. This is also a solved problem
22:41
thanks to machine learning. At Google, this is coming from a technology called WaveNet. It's been developed by DeepMind. Maybe you know DeepMind because they've beaten the Go World Champion. More recently, they are beating gamers, young people who are champions at StarCraft.
23:01
So DeepMind is trying to solve problem by starting from scratch and building from scratch a machine learning model. And here, it's really amazing. Let me get to you two here, these examples. So one is the original recording
23:20
and the other one is actually the speech synthesized with the same sentence. She earned a doctorate in sociology at Columbia University. She earned a doctorate in sociology at Columbia University. This is really hard to tell the difference.
23:41
So if you want to know this one, the one on the right is the original recording. I tried to listen to them very loud and so on. It is a very, very natural result. Maybe it's the best model so far from everything I've shown to you.
24:02
I have to admit, even though I love the vision model because it's solving a problem I was trying to solve, this one is honestly really amazing because it's hard to tell the difference. WaveNet are the voices you can hear in Google Home or in Google Assistant.
24:21
And let's, by the way, try something altogether, okay? So I don't know if you noticed, but on Google Search, you can actually do a search with your voice. So let's try that out.
24:42
What is the temperature in Paris? It's 27 degrees in Paris right now. It's giving you results in real time, even though it might be wrong that sometime when I started to pronounce temperature,
25:02
on purpose, I used my French accent and it's been able to understand me. So let's try something else now. I'm going to go to the French version, okay?
25:21
Quelle est la temperature à l'andre... Oh, sorry. Let me try again. I didn't. Here's a matching video. Quelle est la temperature... Quelle est la temperature à l'andre. Oh, I know. I forgot. I told you. I know what I did wrong.
25:40
I told you I'm going to go on the French website, but I'm actually still on the English one. So here. Now I switch to the French one. Sorry about that. Quelle, la temperature à l'andre.
26:10
Quelle, la temperature à l'andre. Quelle, la temperature à l'andre.
26:23
Quelle, la temperature à l'andre. What I want to show you is the opposite. So I am asking a question in French but with an English accent. And so I messed up a little bit because I started to speak at the wrong time. But what you could see is that you're getting results
26:43
in real time and you are getting the expected results. It's able to understand me, even though I'm really making it hard to be understood. So what does it mean? It means that the speech to text engine as understood, has been trained in,
27:02
as understood how to make, as understood the essence of our language and as understood the characteristics, the specifics of a speech to be able to understand the different words. So we've seen everything you can do with existing models
27:23
or there are of course more features, many options could take a day to cover them all. If you want to generate text, again, I've made this tutorial to generate. So it does take this to generate these three sentences in three different languages.
27:40
What you need to do is to create a client again, you need to call synthesize speech and you need to provide some parameters like the language, you want to generate that, the name of the voice. So there are different WaveNet voices if you want a human like sounding voice
28:03
and you have different options. So here I was only this, I can generate three Wave files. You can try that in this tutorial, okay. So next, a big gap that is filling many, many needs, AutoML techniques.
28:21
So let me show you this example, you will understand better. If I take these two pictures, which are different, right? And give them to the vision model, it will give me almost the same results, sky cloud, sky cloud, because those pictures are actually clouds in the sky.
28:42
But if I want to build a forecasting service, for instance, I need to be able to understand the shape of the clouds. I need to know that it's a Cyrus here and an Alto-Camelosphere. And then I'm stuck because the only info I have is that it's a cloud in the sky.
29:03
So AutoML here can help you still without any expertise in machine learning. So the difference compared to the API, the API is that you need to work a little bit more, you need to build, you need to provide your own dataset.
29:20
You need to provide training data, you need to look for examples and give that to the AutoML pipeline. Once you have the dataset, you can launch a training. It's fully automated. And generally, you will need a couple of iterations to understand how well your dataset is doing.
29:42
And then once you're happy, you can deploy and serve. And then you come back to the previous case where you have your own, this time your own private API that you can use in all of your solutions. So it's still work online here. If you want something that can work offline,
30:00
then you can try, you can train a model that we call an edge model because you will be able to deploy it on the edge somewhere else. So it's a smaller model, not as efficient as the cloud model, but maybe it can work and fulfill your needs.
30:20
So once you have trained your edge model, you can export it and you can get it to run in a container. You can get it to run on your smartphone or even in a web browser with a TensorFlow.js. Okay, so it's very useful for instance, because in factories on prediction lines,
30:41
for many reasons, very often you don't have internet connectivity or you don't want to have it. So you need something that works offline. Even for web browser solutions, of course you need to download the model first,
31:01
but then you can work offline and have something that works in your browser tab with a local model. Okay, so once you have built your data set, so here, if I want to make a difference, I need to label them. So I have a cumulus, cumulus, cumulus numbers and so on. So you label your pictures. There it's a classification problem.
31:21
You want to make the difference between different pictures. You don't need millions of pictures like for the big machine learning models we've seen before. Here, you just need a couple of hundreds of pictures per label, ideally 1000, but with just a couple of hundreds, it starts to work really great.
31:42
And once you have done your data set, you can launch a training. So here, this is a one compute our training. Here are three compute our training. And then you get a sense of how well it is doing because your data set, 80% of your data set is used for training, 10% to evaluate the best architecture
32:03
and the final 10%, 10%, the final, the other 10% are used to evaluate how well it is doing. For classification, you can use the confusion metrics to have an idea about how well it is doing. So here, for instance,
32:21
it's doing great with cumulonimbus and cumuluses, but doing really bad with the altocumulus. Almost 50% of the time, it is confusing it with something else. So the reason there are two reasons here is first, we have less samples of altocumuluses.
32:42
And second, they all look alike. So making data set is going to be an art, I think. You really need to understand that you want to build a balanced data set, and you want to try to remove as much as possible
33:03
the bias that could be in your data set. Because you're going to get- You have 10 minutes left. You're going to get results out of the mobile. And if you interpret the result as a causality, actually, it may not be causality.
33:21
And this is the issue that we can have with the bias. Okay, so it could be something interesting for another talk. Once you have trained your model, then you can use it in an API. You can provide it with new pictures
33:41
it has never seen before. So here, this is my own private picture. I was in Poland that day, and it's telling me that there's a cumulus in this picture at 97%. Really great. Okay, so if you remember my definition,
34:01
we have data and we want information. So AutoML techniques as of today work already on text, features, videos, and also structured data. And this time, you need to choose the features that you want to detect.
34:23
So for instance, you want to do custom classification. Maybe you want to detect custom objects in your pictures. On videos as of today, you can do custom classification. There are beta features also for custom object tracking and so on. You can build your own models on text
34:42
with custom natural language features. You can do your own custom translation and you can do custom predictions on the... So it's a new field, I would say since two years now. It's just the beginning, but it's going to be very useful because you don't need any expertise to build them all.
35:03
So I've done a demo that we are going to be able to try all live. If you are on YouTube, there is a delay. So maybe when you hear that it will be too late and I don't know how much the delay is on YouTube live. So what I've done is a small demo
35:20
where you're going to be able to upload selfies. And in the first part, I'm going to call the vision API and try to detect generic emotions. But also I'd like to know, I don't see you, I'd like to know if someone is sleeping, someone is yawning or someone is having fun. So I've built with my teammates
35:41
and with attendees from previous conferences, I've been my own private custom model that is able to, I hope, able to detect automatically these situations. Okay, the way it works is the following. So from your smartphone,
36:00
you're going to be able to upload a selfie. It will automatically trigger a Python function, which we'll call the vision API and maybe the AutoML vision, my own API if needed. And then we'll do something here, thanks to the analysis, we'll start a result here and you will see it on your smartphone.
36:20
Here, this is a serverless, a small application. It's actually my administration backend for the demo. And here on the screen, you will see the result from the administration panel, okay? So let's try that out. So I invite you to open the camera on your smartphone.
36:44
Okay, it's starting, sorry for the delay. Okay, so you can either flash the QR code here or you can enter in your browser, this URL, bit.ly slash smartep20,
37:01
bit.ly slash smartep20 for your Python 20, smartep20. Okay, so if you go there, you will reach this page and I'm going to, we still have about five minutes.
37:22
I'm going to go to step one, okay, bit.ly slash smartep20. So it will ask you for authorization to use your webcam. So here, this is the generic vision model
37:40
that is going to be used. You can try to upload a selfie and try to trigger a detection for one of these emotions. So let's try that. Yeah, my network must be a bit slow, I'm sorry.
38:03
Seems to be okay, but it's in the cloud. So maybe you'll get faster results on your site. I have no feedback,
38:20
so I don't know if it's working for you. Oops, I should have gotten the results already. Maybe I forgot to pray the demo gods before.
38:42
Maybe that's why. Okay, so let me try again. Sorry about it, I hope it works on your side.
39:02
Okay, so I had an issue before. Yeah, so it does detect surprise with a high level of confidence. Let's try another one. And maybe you've seen, as I have the position of the nose, the mouth, the eyes, everything, I can actually add a mustache to everyone.
39:22
So let's try this last one. Okay, joy with tight confidence. Okay, let's now switch. You can try a few times if you want. Let's switch to the auto email part. So here, if you refresh the page, or if you click next,
39:44
you should be on the same one, but with my own private mobile. And this time try to trigger, try to stick out your tongue to yawn or to sleep. Okay, okay, it found that I'm yawning.
40:04
Let's try another one. This time to make the difference between the generic API and my auto email API, the mustache will have the French flag colors. That's my date.
40:22
Yeah, it does work. So you could tell that I'm cheating, and I'm actually cheating because I built this model with pictures of me, pictures of my teammate and of other previous attendees. So of course, it's normal that it works,
40:41
but we are going to check whether it works for you too. Wow, wow, wow, many people. So Marc Ambree, okay, so happy people. So that was with the generic API. Still surprise people, so that's me. Here, here maybe it was a surprise we wanted to trigger.
41:05
It's in between surprise and sadness. Here is between surprise and joy. Here are two sad people, yeah. Here are angry, yeah, yeah, yeah, yeah, yeah, they're really angry.
41:21
And my auto email model, so let's try if I have, yeah, yeah, so more pictures are coming. You all have your tongue out, great. Tired people, whoops, sorry, tired people, yes. So you see it, I did input people yawning
41:40
with or without their hand, it works, and people sleeping, yeah, it works too. And finally, if you remember, it's able to detect objects with a precise location. So here I have the people, all the attendees with glasses,
42:01
and it seems to work great, okay. So a couple of minutes, so you see it's really easy to use, to do. One note, there are two ways to measure how well your model is performing. And for that, you have to understand the notion of true and false positives,
42:20
of positives and negatives, and whether they are true or false. There are four different cases. If you're focusing on quality, then the precision is the metric you're interested in. If you are using a search engine, then you will use the recall metric, and here you want to minimize the number of false negatives, you want more results.
42:42
I will let you have a look at this, if you want to a little bit understand how auto email works, at least at Google, there's one specific feature. If you want to do more machine learning, then you can use frameworks, one of them is TensorFlow, so it's an open source,
43:02
maybe the most popular one on GitHub, by far another one is PyTorch, that I hear a lot from experts. So what have we seen? We've seen that there are three ways you can use that. With the APIs, you just need a couple of hours, with auto email you need days,
43:21
and weeks or months if you want to become an expert. The difficulty, there's absolutely no difficulty with the APIs, with the auto email, you need to build a data set, and for that you need a couple of days. Okay, a few links if you're interested to check out some solutions. Here, this is an online comic coming from Google AI,
43:43
you will find lots of the terms, so it's a nice refresh if you want to understand a bit better. If you want to get the slides for this talk, they are here, you're very much welcome to send me a feedback too. So thanks a lot for having me today,
44:02
my goal was to give you this overview of what you can do as a developer, and you don't have to be an expert to do everything you've seen. So I hope you learn a few things, and also I hope it gave you a few ideas. Thanks a lot for having me today,
44:21
and have fun, have a great EuroTitan, thank you. Okay, thank you very much, Laurent. That was a very interesting talk, lots of topics, lots of things covered. We do have a number of questions, but the time is already up, so I would say that we basically take them to the talk channel that I posted in the chat.
44:45
And then you can answer them there. It would also be a good idea to maybe post the links that you have here in the slides in the talk channel so that they stay up and then are easily reachable. Sure, I will do so immediately. Right, so let me give you your applause, well-deserved.
45:02
Thank you very much.
Recommendations
Series of 2 media