We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

AI on a Pi

00:00

Formal Metadata

Title
AI on a Pi
Title of Series
Number of Parts
160
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
AI on a Pi [EuroPython 2017 - Talk - 2017-07-13 - Anfiteatro 1] [Rimini, Italy] In recent months, Artificial Intelligence has become the hottest topic in the IT industry. In this session, we’ll first explain how Deep Learning — a subset of AI — differs from traditional Machine Learning and how it can help you solve complex problems such as computer vision or natural language processing. Then, we’ll show you how to start writing Deep Learning applications in Python thanks to MXNet, a popular library for Deep Learning for both CPUs and GPUs. We'll also see how to use pre-trained models and we'll load one on a Raspberry Pi equipped with a camera. Finally, we’ll show random objects to the Pi…and listen to what it thinks the objects are, thanks to the text-to-speech capabilities of Amazon Polly
SoftwareMachine learningTask (computing)Multiplication signProjective planeCodeSystem callFood energySoftware developerLine (geometry)Library (computing)Visualization (computer graphics)Expert systemArithmetic progressionMachine learningRepresentation (politics)Artificial neural networkOffice suiteParadoxService (economics)Demo (music)SpacetimeNeuroinformatikMIDIVirtual machineNumberComputer scienceRight angleNatural languageRoboticsEndliche ModelltheorieTask (computing)WebsiteAcoustic shadowPoint (geometry)Presentation of a groupOpen sourceBuildingWeb 2.0Lecture/Conference
Computer networkArtificial neural networkPattern recognitionEndliche ModelltheorieSoftwarePredictabilityVirtual machineCartesian coordinate systemoutputPixelSet (mathematics)Web 2.0Function (mathematics)PlastikkarteGraph coloring1 (number)Artificial neural networkNeuroinformatikInformationLoginNumberResultantApproximationUniverse (mathematics)Axiom of choiceUniform resource locatorVideo gameMultiplication signLink (knot theory)Arithmetic progressionTheoremArithmetic meanGreen's functionPerfect groupForm (programming)WeightRight angleElectronic mailing listLie groupGradientException handlingComputer-assisted translationDialectCoefficient of determinationTheoryMachine learningBlogObservational studySimilarity (geometry)JSON
Artificial neural networkTexture mappingComputer networkPixelPattern recognitionComputer-generated imageryTelecommunicationPower (physics)Cartesian coordinate systemReal numberRight angleGame theorySet (mathematics)Scaling (geometry)Wave packetMachine learningVideo gameMIDIDigital signalDuality (mathematics)Graphics processing unitVirtual machineBuildingInformation privacyComputer animation
Computer-generated imageryGoogolVisual systemScale (map)Pattern recognitionScalable Coherent InterfacePower (physics)Graphics processing unitExpert systemMassCoefficient of determinationCategory of beingMedical imagingExistential quantificationNeuroinformatikElasticity (physics)Different (Kate Ryan album)Data storage deviceSolid geometryCartesian coordinate systemScalabilityPoint cloudRight angleRow (database)Multiplication signEndliche ModelltheorieGame theoryReal numberObject (grammar)Wave packetLecture/ConferenceSource codeJSONComputer animation
Scale (map)Pattern recognitionVisual systemCoefficient of determinationArtificial neural networkNormal (geometry)Line (geometry)Category of beingError messageGame theoryNumberResultantRight angleMedical imagingBit rateDiagram
Scale (map)Visual systemPattern recognitionVirtual machineNeuroinformatikEndliche ModelltheorieCondition numberCartesian coordinate systemRight angleMedical imagingNumberMereologyMultiplication signDiagram
Pattern recognitionComputer networkArtificial neural networkTexture mappingConnected spaceMiniDiscPixeloutputFunction (mathematics)NeuroinformatikMaxima and minimaRight angleConnected spaceMedical imagingNumberInformationView (database)Point (geometry)Wave packetSet (mathematics)Category of beingAdditionComputer architectureMultiplication signSoftwareParallel portData structure
Scale (map)Pattern recognitionVisual systemLogistic distributionProduct (business)MiniDiscProduct (business)WebsiteRight angleBitMultiplication signOrder (biology)RoboticsDifferent (Kate Ryan album)InternetworkingVideo gameVideoconferencingGoodness of fitMachine learningWeb pageContent (media)YouTubeComputer animation
Pattern recognitionMachine visionSpeech synthesisSoftware developerIntelBefehlsprozessorGraphics processing unitMobile WebForcePolygon meshComputer clusterSicDegree (graph theory)Natural languageCategory of beingComputer-generated imageryMathematical analysisDemo (music)Library (computing)Medical imagingMachine learningFamilyVirtual machineINTEGRALService (economics)AlgorithmLevel (video gaming)Expert systemOrder (biology)Computing platformSet (mathematics)Bookmark (World Wide Web)Wave packetObject (grammar)Real-time operating systemNatural languageWeb applicationComputer fileMobile appFacebookInstance (computer science)Right angleVideo game consoleChatterbotInterface (computing)Product (business)InformationBefehlsprozessorProcess (computing)Pattern recognitionSpeech synthesisCartesian coordinate systemSelectivity (electronic)InferenceSystem callPairwise comparisonPolygonTensorWebsiteDataflowSoftware developerMultiplication sign
Scaling (geometry)Point (geometry)System callComputer animationLecture/Conference
Inclusion mapArtificial intelligenceRight angleMultiplication signCASE <Informatik>Medical imagingPattern recognitionSource codeRoboticsMobile appLaptopConfidence intervalException handlingFile formatNatural languageService (economics)System callGenderInformationExpert systemCartesian coordinate systemTraffic reportingRange (statistics)Local ringComputer configurationPoint cloudSet (mathematics)AdditionFunction (mathematics)TouchscreenSoftware testingFocus (optics)Different (Kate Ryan album)Context awarenessPoint (geometry)Connected spaceSoftware2 (number)Descriptive statisticsGroup actionProduct (business)Phase transitionComplex (psychology)Maxima and minimaNumberDefault (computer science)Limit (category theory)Software developerReal-time operating systemLetterpress printingWordScripting languageComputer filePolygonSingle-precision floating-point formatPerfect groupPlastikkarteWebsiteComputer animation
RoboticsRight anglePolygonMultiplication signEvent horizonSource code
Computer programOpen setElectronic data interchangeMultiplicationInternetworkingKey (cryptography)RoboticsMultiplicationOperator (mathematics)Physical systemScaling (geometry)Endliche ModelltheorieProjective planeMultiplication signSemiconductor memoryCodeNatural languageWave packetRight angleCommutatorServer (computing)Point cloudCartesian coordinate systemNeuroinformatikContext awarenessInstance (computer science)WordDifferent (Kate Ryan album)Connected spaceMedical imagingLimit (category theory)Presentation of a groupProgrammer (hardware)WritingLevel (video gaming)Graphics processing unitBuilding
Ideal (ethics)MultiplicationScaling (geometry)Demo (music)Inclusion mapMaxima and minimaMaizeDivision (mathematics)Server (computing)MultiplicationScalabilityScaling (geometry)Wave packetProjective planeFunction (mathematics)outputCategory of beingEndliche ModelltheorieRight angleKey (cryptography)SoftwareLibrary (computing)Device driverStudent's t-testVirtual machineCodeLine (geometry)Type theorySet (mathematics)WeightResultantComputer hardwareMaxima and minimaService (economics)Demo (music)Digitale VideotechnikWordLevel (video gaming)Network topologyInstance (computer science)Validity (statistics)Series (mathematics)DigitizingMedical imagingDialectPoint (geometry)Pattern recognitionSoftware frameworkGame theoryDataflowExtension (kinesiology)Keyboard shortcutConnected spaceSingle-precision floating-point formatBlogGraphics processing unitSoftware developer2 (number)BootingWechselseitige InformationComputer animationDiagramSource code
Inclusion mapDemo (music)Object (grammar)SoftwareWave packetStapeldateiMultiplication signCuboidSet (mathematics)Moving averageRow (database)TheoremApproximationEndliche ModelltheorieStructural loadUniverse (mathematics)DigitizingMedical imagingPerfect groupValidity (statistics)Exception handlingRight anglePhysical lawCategory of beingGoodness of fitSource code
Demo (music)Object (grammar)Wave packetPoint cloudMedical imagingScalabilityEndliche ModelltheorieSoftwareRight angleInternet der DingeJoystickSet (mathematics)Message passingService (economics)RoboticsFinite-state machineObject (grammar)Server (computing)Source codeComputer animation
RoboticsMessage passingInternet der DingeLocal ringObject (grammar)TwitterRobotEndliche ModelltheorieToken ringSelectivity (electronic)WordRight angleObservational studyWeb page
Inclusion mapCommunications protocolMessage passingHoaxObject (grammar)DistanceWater vaporMereologyWordRight angleCategory of being
Duality (mathematics)LaptopObject (grammar)Level (video gaming)Right angleEndliche ModelltheorieSource code
Demo (music)Object (grammar)Lie groupState of matterBlogData miningException handlingComa BerenicesInformationElectronic mailing listRight angleWeightWave packetLevel (video gaming)
Transcript: English(auto-generated)
Can you hear me fine? Yes, sounds like it. Okay. Well, it's been a long day for you guys. It's been a long day for me. Flying very early this morning from Paris, but it's good to be here. I hope you have some energy left. It's a longer session, and so we will have plenty of time.
And I suggest if you have any questions during the session, please ask your questions. Please raise your hand, and we have a microphone somewhere that you can grab and ask your question, okay? I'd rather have a more interactive session than just wait until the end and answer questions on stuff
I was talking about 34 minutes ago, okay? So, please raise your hand and ask all your questions. So, my name is Julien, or Julien, or Juliano, or whatever you want to use, that's fine. I'm a tech evangelist for AWS. I'm based in the Paris office, sometimes,
because most of the time I'm really traveling and talking to developers like today, and that's fine. I've been with AWS for about a year and a half, and before that, I've spent the last 10 years or so as a CTO and VP engineering in web startups in Paris, okay?
So, indeed, today we're going to talk about artificial intelligence, and I'm going to take you on that long journey that started in the 50s, and if everything goes well, we'll end up running some stuff on my Raspberry Pi robot,
which is waiting in the shadow to exterminate us all, maybe, who knows? And we'll see what we can do with this, okay? So, I'll start with a very quick introduction of AI and why it's been mostly frustrating so far,
and then we're going to talk about Amazon AI, what do we do at Amazon and AWS, and recently, fairly recently, we've had some new stuff coming out. And then I'm going to talk about, and most of the presentation, actually, will focus on an Apache project called MXNet.
Who has already heard of MXNet? Alright, well, okay, I'm progressing, usually it's zero people, which is why I'm talking about it. So, MXNet is a deep learning library which is extremely developer-friendly. It's really designed for quick experimentation by developers and by non-experts, right?
So, I always say you don't have to have a PhD to use MXNet, and that's fine, that's exactly what we want to do. So, we'll talk about MXNet for a minute, you know, the high-level features and so on, and then, of course, we'll go into some demos using Python code,
what a surprise, right? I can show you some other code if you want, but you're going to throw stuff at me, right? So, it has to be Python today. And then, of course, I'll point you to some more tools and some more resources to get you started. So, the story so far, well, that's it, right?
So, who has no idea what this is? It's okay if you raise your hand, alright. Okay, so, thank you for making me feel very old. I keep thinking some people in the room are older than me, but as time goes by, this becomes false, you know, all the time.
So, this obviously is from the Stanley Kubrick movie, 2001, Space Odyssey, if you haven't seen it, you have to see this, it's a masterpiece, and it's the first visual representation of artificial intelligence. It came out in 1968 or 1969,
and this computer is inside a spaceship, and it's really running the ship, and the astronauts, you even wonder why you have astronauts in there, which is probably why the computer decides to kill them eventually. So, you know the end, but you should still see the movie. And I guess a lot of geeks like me and computer scientists
and researchers have been obsessed with this, you know, when we first saw the movie. This is what we're trying to build, right? This is ultimate artificial intelligence that can understand natural language, that can speak, that can handle very different and very complex tasks.
You know, imagine driving a spaceship, you know, how complicated is that? So, this is basically what people have been trying to conceive and build over the years, and, well, have they succeeded? And actually in 2001, the real year 2001,
a famous computer scientist called Marvin Minsky, he's one of the fathers of artificial intelligence, he founded the AI lab at the MIT, and did plenty of other things, and actually was an advisor to Kubrick on the movie, right?
So, he worked with Kubrick in 68 to design what HAL would look like, right? So, pretty funny. And so in 2001, Minsky wrote a paper that said, it's 2001, where is HAL? And, well, obviously nowhere. And it gave a number of reasons why artificial intelligence
had not made a lot of progress in 40 or 50 years, right? And he thought we were still very, very far away from having HAL. And I think it's a paradox, because in the mid 2000s, machine learning started to explode, right?
And now today, everybody... Who's doing machine learning in the room? See? All right. Okay. Thank you. Everybody, right? All right. At least everybody has machine learning on their resume, and that's fair enough, right? I have it. So, it's a commodity, it's easy to do machine learning.
You can get some open source libraries, you can grab the Scikit tools in Python and build machine learning models in just a few lines of code, you can use cloud-based services, you have a wide choice. So, why did machine learning become so successful
starting in 2000 until 2010 and now even more so? Why did this make a lot of progress and why did AI not make a lot of progress in the same years, right? Well, it's because, as you know, in machine learning,
it's all about the features, right? So, it's fairly easy to build a prediction model provided that you have clear features. And actually, most of the work of data science is to find what features are useful in the data set, how to engineer them, how to prepare them
so that they can deliver a nice, efficient working model. Okay? So, let's take an example. If you have a web log, Apache log or something similar, and you want to use those logs to predict user activity, you know, it could predict what link they're going to click on, what ad they're going to click on, etc., etc., typical activities,
then all the features are pretty much available, right? You just look at what the log has and, you know, the time and date and URL and user agent and blah, blah, blah, blah, and probably 50 more. And you just have to figure out which ones are user ID, obviously,
and you have to figure out which ones are the ones that are the most relevant for your machine learning model, okay? And you go and tweak them and combine them and twist them into form until you have a working model. Now, let's take a different problem, okay? Suppose I take a picture of this room,
you know, a thousand pixels by a thousand pixels, and I want to know who's in the room or what's in the room or is that even a room? I want to know what the picture is. Okay, so it's a million pixels. And if it's a color picture, it's likely that it's actually three million pixels, right? Red, green and blue.
So, does this mean I have three million features? Should I take those three million features, you know, flatten them and send them into a prediction model? Would that work? Probably not, right? I'm sure some people have tried, but, you know, that's not really working.
And if you think about it, does it even make sense, right? Is every single pixel in this picture useful information, right? Look at the seat you're sitting on. I mean, it's all gray and it's all the same color. So, do I need all those individual pixels to figure out this is a seat and it's gray?
Again, probably not, right? Common sense tells me no, I don't, okay? But that's the difficulty in building those smart applications, right? Common sense, human common sense, tells us the answer immediately. If we brought a five-year-old kid in this room and asked, okay, what do you see?
He said, well, I see people sitting in a room. So, it looks like a classroom maybe, right? And if you show animal pictures to that kid and say, okay, is that a cat, is that a tiger, is this a dog? He would know instantly, okay? But if you ask that kid, okay, how do you know it's a lion and how do you know this is a cat, right?
Then it becomes more complicated, okay? And he would give you some answers, right? But how do you fit that into data that a computer can understand? And that's the number one problem with deep learning, right?
And this is the problem that deep learning is actually trying to solve. It's trying to solve, it's trying to teach computers to understand informal things, right? Things that you and me know pretty much from four-year-olds, from being four-year-olds and older.
But it's impossible to teach a computer to do this. Okay, so if you try to do it the machine learning way, it doesn't work. Okay, because there are too many features, there's just too much information, and you cannot feed all this information to a machine learning model and get a decent result. So, of course, the answer to this is neural networks.
And they're not new at all. They've been around for decades. The early work even goes back to the late 40s. But the first major applications came in the 50s, right? So, it's literally 60 years old technology, right?
60 years old. And what is a neural network actually? Well, people have written books about this, have spent their life explaining this. So, I'll keep it shorter and simple. Basically, a neural network is a universal approximation machine.
It's the name of a theorem that says that if a network is large enough, and if you give it enough data, it's going to learn anything perfectly. Right? That's it. So, it's a learning machine. It's a learning machine. You design it, you show some data over and over and over again,
and it learns perfectly how to predict a given output from a set of given inputs. It can predict absolutely anything. And, you know, magically. You don't have to understand exactly what happens in there, which is nice. But, okay, mathematically, they're great.
Theoretically, they're great. For limited applications, they were great. But, you know, until very recently, they didn't really work, right? They didn't really work. And if some of you, the older ones like me, were in the university, let's say in the 90s,
and you studied probably, you know, a few hours on neural networks and AI, I'm sure your teacher told you something like, well, yes, okay, artificial intelligence is really cool, neural networks are really cool, you can do all kinds of crazy stuff on paper. But, you know, outside of the lab, they're pretty much useless because we cannot solve bigger problems with those.
So, it's all about scale, scale, scale. So, that's it. That's the reason why they didn't work. That's the reason why in the 60s and the 70s and beyond, they just stayed in the lab and there were cool stuff to play with, but there were no industry applications, no real-life applications of these
because data was just not available. Okay, and remember I said you need lots of data to train. And computing power was not available either, right? So, tough luck. But it changed, right? It changed for three reasons.
The first one is data sets are everywhere, right? Digital data is literally everywhere, okay? Audio, text, pictures, everything. Some public data sets are available on the internet, large ones, you can grab them, you can mine them, you can go on Kaggle,
you can do machine learning competitions, deep learning competitions. So, data is just everywhere. So, everybody now can grab data and start training and start building apps. The second thing is computing power is, well, it's always a problem, isn't it? It's less of a problem than it used to be, right?
Because now we have GPUs and in the mid-2000s and later on, research teams found that actually GPUs could be used for something smarter than playing 3D games, right? So, instead of using all that fantastic power to build 3D games and shoot each other,
well, we could, you know, do some actual scientific work. And, you know, I guess that's a good thing. And so now GPUs are everywhere. They're fairly cheap and they deliver massive amounts of computing power. And the third thing that helped deep learning explode
is the elasticity and the scalability provided by clouds. Okay, because just like everything else, why would you buy 50 fancy GPUs to train for, you know, a couple of hours a week and have them do nothing the rest of the time
when you can just go to the cloud and grab a few GPUs for a few hours, train your model and pay for those few hours and release them, right? So, the elasticity, the pay as you go, you know, everything that you know for cloud also applies there, right?
Compute, storage, etc. It's all there, grab it, use it, release it and pay exactly what you have to pay and nothing else. So, deep learning exploded, right? But let's look at a concrete application. Every year there's a competition called ILSVRC among, you know, research teams across the world.
And what they do is they take that ImageNet dataset, which is a very large dataset, which is composed of images with a single thing in them.
So, it's either animals or objects or plants, no humans. And they have thousands of categories and they have to predict the right category for each image, right? That's the game. So, actually they can predict five categories for each image
and if the correct category is in the top five, then it's considered a win, okay? And so they've done this for years and years and years. Here's an example. These are real images from the dataset. So, who thinks these dogs are, so it's not the same dog, but are they from the same breed?
Who thinks they're not the same breed? Do we have any Eskimos or Norwegians in the room? Norwegians are usually pretty good at this. And who has no ID?
Well, personally I have no ID and I think if you gave me 15 minutes, I would still have no ID. Some things tell me it's the same, some things tell me it's not the same, right? So, for the record, it's not the same breed, okay? But how would you know, right?
How would you know? Unless you're a real dog expert and you could actually explain to me that, oh, see the difference here and here and here, okay. And then I show you a different breed and you're not an expert of those dogs and you don't know, right? So, that's that informal knowledge we need to fit into the computers. So, they've been playing this game with dogs and plenty of other categories for years
and these are the results. So, they started in 2010 and the blue line is the error rate, okay? So, it goes from 28 to 25 to 16 to 11 down to 3% last year, okay?
Only 3% error. And the red bar is how deep, how many layers the neural network that one was, okay? So, in the first couple of years it was just one layer and then it went up, as you can see,
8 and 19 and 22 up to that crazy number of 269 layers, okay? So, now the question is what do you think the score would be for humans, right? Normal humans. If I gave you the ImageNet dataset, all right, and lots of coffee and I asked you to score,
it's millions of images, so it would take a while, but okay, theoretically you could do it with lots of coffee. What would be your average error?
I have no idea. Well, I guess our brains have much more than 269 layers, okay? So, but still, our brain is different. So, what would be the score? Who says less than 5 or less than 3? Who thinks humans would actually beat the machine?
Who says more than 10%, okay? And between 5 and 10, okay. So, the answer is actually 5, 5.1 if you want to be exact, okay? But again, it's theoretical because if I gave you maybe 50 images, you would do this.
If I gave you a thousand images, maybe not so much. If I gave you a million images, you would never get to the end, right? So, the computer can do it faster, longer, is never tired, and you know, it will give the same answer all the time.
So, what this means is actually deep learning models and computers are now better at recognizing stuff than us, right? I would say, given the condition that they actually have been trained on it, of course, if you show them something they've never seen, right?
They won't know, and maybe we will, because we're smarter, right? But still, it's an impressive number, and I'm sure it will keep going down. So, this is just one example. There are many more applications of deep learning, and we'll see a few more as we go.
But now let's try to talk about how you can actually, oh yeah, please, you have a question. What does it mean, layer? It means this, in the top right corner. So, a layer is a set of neurons that are connected to the previous layer and to the next layer,
and they all work in parallel to do some computation, right? So, at the minimum, you will have an input layer, which will be your input data. So, let's say my pixels, right?
So, let's go back to my million pixel example. So, I would have one million input neurons, each of them with one pixel value. And the output layer would be probably the number of categories I have. So, let's have a thousand different categories, okay?
So, I would have a thousand neurons in the output layer, and I would just want one to be activated for a given image, right? And in the middle, I've got what we call hidden layers, which are just additional layers of neurons that do their magic, right? That just extract features, we'll see some examples, extract features from the input layer
and gradually learn how to activate the correct output neuron for a given input, right?
So, there are different structures. In this example here is what we call a fully connected network. So, each neuron is connected to all inputs and all the outputs of the previous and next layer, okay? But there are different architectures, right? And now you start to understand why it's so heavy from a computation point of view, right?
Because it's, you know, it's N1 times N2 times N3 times, so it's a lot of connections and each of them has to be optimized and computed.
Yeah, yeah, training is, we'll do some live training on a smaller dataset, right? You'll see that. Okay, so now let's talk about what we do at Amazon. So, actually Amazon has been doing lots of AI for, I want to say forever, it's probably not quite true, but it feels right.
Amazon was started in 95 as a bookshop, as you know. And if you go to the internet and look for screenshots of the early website, very early on you had recommendation, right? And then you had content personalization, et cetera. So, very early on, you know, they felt they had to have that smart feeling to the website, right?
That custom experience to the website. And then as time went by, Amazon started to use AI for their, what we call the fulfillment centers, so where the goods are actually stored
and where they're shipped from. You may have seen those videos on YouTube of the Amazon robots that pick up the shelves and move the shelves to the humans so that they can pick the objects and prepare your order, right? If you haven't seen this, you should take a look. Just look for Amazon Robotics on YouTube.
And today we have more than 40,000 robots live every single minute in all of our fulfillment centers, just moving around autonomously and moving stuff so that, you know, we can all get our orders in time. Of course, there's tons of AI and machine learning on the website.
If all of us went to the same webpage on Amazon, we would not see the same thing, for sure, right? We would see different products, different layouts, different everything actually. And I'm sure you've seen this, although I don't think it's available in Italy, but it's available in the UK and in Germany and in the US,
so hopefully, and not in France, hopefully soon. So the Amazon Echo family of devices, the personal assistants, and you can just talk to them and order a taxi, order a pizza, ask for the news, ask for weather information.
Every single day we have new skills, as we call them, that come out for the Echo devices. And it's evidently all based on deep learning technology, natural language processing, text-to-speech, et cetera, right? So that's a consumer product. That's, I would say, the visible side of all the work that Amazon is doing on AI, right?
But there's, of course, we're developers, so we want to build stuff. And there's a full stack of AI and machine learning solutions and services that are available in AWS, right?
Starting from, of course, the infrastructure, the instances. So obviously, we have CPU instances, we have also GPU instances. I'll show you one in a minute. We'll do some training. On top of this, we could run your favorite deep learning libraries.
So today I'm going to use MXNet, but you could use TensorFlow or Keras or something else. Then if you want to actually go deeper and really do, you know, build your custom algorithms and your custom applications, you could use our EMR service, which is basically a managed service for the Hadoop ecosystem,
to do Spark and all the other Hadoop friends. You could do Amazon machine learning, et cetera, et cetera. So we have a full set of services that allow you to build smarter applications.
But you need to be an expert, right? And although we all have machine learning and soon deep learning on our resumes, not everyone is an expert. So we thought, and our customers asked us to do that,
that it might be interesting to build some higher level services, which are just an API call away and very simple to use and yet able to do very complex things. And these are the three services that you see on top. Lex, Polly, and Rekognition. And I'm going to talk about those for a minute and then show them to you.
So the first one is Polly. Well, it's easiest to explain. Polly is text-to-speech, right? So it's just one API call, select a language, select a voice, and you get the sound file in real time with a human sounding voice. So today we have 24 languages, including Italian, so we can try that,
and 48 different voices, and we'll keep adding more. The next service is Lex. So Lex is a chatbot service, so you can design a conversational interface using text or using, again, voice,
and integrate that in the AWS platform with your web app or your mobile app or on Facebook or on external channels. So pretty cool service if you're into chatbots. And the last one is Rekognition. So Rekognition is image recognition,
so it can do object detection, face detection, face comparison, etc. And as you can imagine, all these are obviously based on deep learning. But all you have to do here is just call an API, right? So let's give it a try. Okay.
So of course we could play in the console here, and, you know, Rekognition and Polly. Oh yeah, we want to try the Italian, all right. Could I have the mic for a second?
Okay, so that might be a little small. Okay, so here are all the voices that we support, right? And so for Italian we have two voices. We have Carla and George Jewell. Very Italian names, so let's try this.
Hopefully I have some, yeah. Oh no, I don't. It's coming, oh, it's on the HDMI. All right, all right, that's okay, we can do this.
Okay, let's try that again. Okay, so that's Carla and let's have George Jewell. Okay, and you can go totally stupid if you want to do...
Do you have anyone from Iceland in the room? Too bad, okay. All right, let's do Icelandic. Okay, all right. Well, we could do this all day, but that's not the point.
Thank you. Okay, so as you can see, this is really just, this is just an API call. So I could do this, I could do this on my laptop too. We'll do it on the robot afterwards. Okay, I can just show you.
Okay, so this is local here. And let's, okay, so that's Poly basically. I want to show you recognition now. So that's Poly, right? 24 voices, 24 languages, 48 voices, and it's extremely fast.
So you can either play the sound file and have an interactive thing going or you can save it and use it in your applications. So very, very easy to use. If you want to see what the API looks like, after all, let's do this. Well, this is it really, right? That's all there is to it.
You select a voice and the format which is MP3 here by default and the text that you want to generate and that's it. So one API call away and you get in real time a human sounding voice. And then either you play it or you save it for further use.
So that's all you need to do. One API call. You don't have to be a deep running expert to do this. Okay, let's take a look at the recognition now. So let's take my favorite image. Should I show it to you?
Yeah, of course. Okay, so it's Oktoberfest but I'm sure we have Rimini Fest and Bologna Fest. We have the same thing all over the world, right? Okay, so that's my picture. It's a fairly complex picture.
So now let's call recognition on that picture and see what happens. Oh, and I... Oh, sorry. I need to copy that image to S3.
So that's my bucket name. Or could we have the sound on the HDMI? No, maybe not. We don't have a tech... Yeah? Oh, wow. So I need to go in my max settings. All right.
I'm going to learn something now. Yeah, it says HDMI. So just try this again.
Oh, okay. See what you mean.
Perfect. Okay, thanks. No, I'll be fine actually. You can hear it, right? Or not? Do you want the mic? Yeah, let's have the mic. Okay, so let's send that image to recognition and see what happens.
Okay, so pretty immediately I see some labels and confidence. 15 faces have been detected. Here are some key words about this picture. People, person, human, alcohol, beverage, drink, crowd, female, girl. Okay, so I would say that's a fairly accurate description of that picture, right?
And so, like I said, we've got labels, we've got confidence scores, and then we find 15 faces, which is the maximum number we can find. It's a predetermined limit, right? We stopped it at 15.
And for each of them, we get some information like gender, age range, emotion detection, and there's additional information on where the nose is and where the eyes are, et cetera, but I didn't print it out. So, if I show you my script and, you know, highlights the faces that have been found,
and as you can see, we see 15 faces, right? And we could check that, okay, face two here, that lady here is, where is she? Okay, here she is.
Okay, she's female, she looks pretty happy. Well, she looks to be closer to 23 than 14, but okay, you know, pretty safe. Oh, by the way, never do this with your girlfriend, okay? Never take a picture of your girlfriend and use this, never, right? Or your wife, never, right? Or your mother might forgive you, but your girlfriend will not.
Okay, trust me. All right, okay, let's try a different one, just for a second. Okay, so I'm not showing you this picture for now.
A single face has been detected. Here are some key words about this picture, city, downtown, metropolis, urban. Okay, and obviously that's Polly that you hear here. I'm extracting some outputs and sending them to Polly.
Okay, so let's see how fast you can find the face, because there is a face in here, ready? All right, so it's a larger crowd, but usually with a smaller group, you know, I have like a, you know, I ask people to raise their hand
when they see the face, and yeah, thanks. But, you know, for some people it takes a few seconds, right? So, because it's way over there, and it's hidden in this very complex picture, and it's interesting to see that the cartoon face is not picked up,
because it's not a face, because no one has eyes half the size of their face, at least not where I come from. Or maybe after a very, very long evening, abusing substances, but in the general case, that's not a human face, right?
Okay, so this is recognition, okay? Pretty cool. But, and you will find all the code and everything, again, it's on GitHub. I can just show you the Ricoh API, it's super easy as well.
Okay, so this is the one to detect faces, right? Again, single API. You literally copy the image to S3 and point recognition to it. Comparing faces needs a source image and a destination image.
But as you can see, you know, it's, and detecting an image, again, is as easy as this, right? Where the image is, how many labels you want, and what's the minimum confidence score that you want to report, okay? So, pretty smart services, but if you're a very bad Python developer like me,
you can use them in minutes, right? You don't have to be an expert, okay? But there's a problem with this, okay? And yeah, we have tons of customers.
Let me mention a few. The Washington Post is using Polly in their mobile app to read articles, right? So you don't have to look at your phone like this. You can just, you know, click play and let the Washington app read the article to you, right?
And you can actually focus on what's outside, which is nice. And, you know, Capital One is one of the top ten banks in the US. A very large bank. And they have a Lex application for people to use to have information on their banking details, right? So you can just, instead of going to the bank website and looking at the, you know,
the detailed report that we never quite understand, you can just use a chatbot and say, okay, how much did I spend on the restaurants last month? Right? That kind of thing. So that's very, very cool. So, again, like I said, there is a problem with what I showed you, right?
I would say in the context of, you know, devices and robots, etc. The problem is we need the cloud, right? We need a cloud connection. We need network connectivity. And, sure, I could use recognition and everything on my robot over there.
So maybe it's time to bring the robot now. So, okay. Here's my friend.
So it's a Raspberry Pi robot. So, sure, I can connect to that little guy. And I could use Poly to do tons of silly things like this.
Thank you for visiting us today. I hope you'll have a great time.
Now, Julien, could you please stop clowning around and get on with the talk? Right. Thank you. And thank you to Air France for not crushing the robot this time. Yeah, it's a pretty regular event. I keep fixing it, but now it's okay.
So, yeah, you know, it's a Raspberry Pi. It has a Wi-Fi key. I can connect to the internet. I can use anything I want with this, okay? It's a Linux system. But can we expect all the robots of the world and all the devices of the world to be always on,
always connected to the cloud? Probably not, right? It's an unsafe assumption. Autonomous cars and stuff, right? You go in the tunnel and then what happens? So we need something different, right? Something that could work and not be cloud-based.
And this is what I'm going to talk about for the rest of the presentation. We can build deep learning applications using MXNet and embed them on devices like this, which are not powerful at all, right? This has a one gigahertz clock speed and one gig of memory.
So it's a very, very small device if you compare it to a typical computer or server, right? And we're going to do local AI on this little fellow here without any cloud connection. So a few words about MXNet first. So MXNet is for programmers, right? Like I said, it's developer-friendly.
It supports multiple languages like, of course, Python. C++, JavaScript, Matlab and Julia. And I'm sure I'm forgetting something here. It's an Apache project. So it's open, it's not controlled by any company. AWS has committed to supporting this project
because we think it's the most appropriate and I will explain why in a minute. Both for cloud-based application and for smaller devices. And I think in the top 10 MXNet committers, we have four people, four or five people working for AWS right now.
It's high performance. As you will see, even on a small device like this, it runs fairly fast and it doesn't require gigabytes and gigabytes of memory, right? And like I said, we endorsed it because for all those reasons
and also because in a wider context, it scales very well, right? So why is scaling important? Scaling is not really important at this stage. Scaling is important when you train the model, right? So when you actually take those million of images or those terabytes of sound bytes, et cetera,
and you train the model, okay? This is where the operations are the heaviest. And so you want to be able to use as many GPUs as you can to speed up training, okay? And MXNet can do this very easily in the code and it can do it very efficiently
when it's running on, as you can see here, up to 16 GPUs with almost linear scaling, which means if you train on 16 GPUs, it's pretty much 16 times faster than when you train on one GPU, right? Almost perfectly linearity.
And it goes beyond this. Why 16? Because 16 is the largest GPU instance that we have, right? The largest instance that we have is 16 GPUs. So in one server, that's the limit. But we can have multiple instances and we could go up to 256 GPUs, so 16 instances with 16 GPUs in them.
And again, as we scale on multiple, training on multiple servers, we see almost linear scalability again. And that's something you will not see in other frameworks. Most other frameworks can either not do GPU at all
or they can do maybe one GPU or maybe, maybe if you tweak your code like a maniac, you can get it to run on multiple GPUs in the same machine. And that's just one line of code when you do that in MXNet. And then training on multiple hosts, then this becomes a real project
if you want to do this with other libraries. For MXNet, it's almost as simple as sharing SSH keys across the nodes so that they can connect to one another and that's about it, right? The data set will be split automatically and so on. It's really nice. So that's one of the reasons also why we like MXNet.
It's because scalability is very important for our customers and thus it's very important for us. And we want to make sure we build services that scale to the max, right? So let's do some demos. So let's start with something simple.
Let's do some training for a second. So here I'm going to use a GPU instance to train an image recognition model on the data set which is called MNIST. I guess most of you or some of you have seen this before. MNIST, yeah, it's very popular.
It's 70,000 handwritten digits from 0 to 9. And of course the goal is to show an image and get the proper result at the end, okay? So let's do this. And you can see where is my instance.
Okay, so here I'm running on a smaller GPU instance. It has only one GPU but that's more than enough for what I need. And I'm running an Amazon machine image which is called the Deep Learning AMI which is built by us and you can use it at no cost.
And the cool thing with this is that it comes pre-installed with everything. So whatever framework you use, you know, Caffe, MXNet, TensorFlow, blah, blah, blah, anything else, it's already in there. So you can just boot up your GPU instance with this image and everything is ready for you to work. You don't need to go and install the CUDA drivers and the NVIDIA stuff
which is a little tricky to do, okay? And so here, whoops, okay, I wrote MXNet, yeah. All right.
So I designed a very simple model, right? So it's, okay, it's 30 lines of code, right? To do everything. So it's, like, when I say it's developer friendly, it really is, it's very high level, right?
So you don't have to go and, coming back to your questions earlier, you don't have to go into the details of every single neuron and just define layers, connect them, and that's it. Okay, and that's what I'm doing here. So I've got a series of blog articles on this with every single detail explained, so I'm going to go a little faster here
because I want to get to the point. Basically here, you just load the data set, right? So you load those images. There's a training set that we use for training and there's a validation set that we use to evaluate the quality of the model, just like we do in machine learning, okay? This is the network definition, okay?
So an input layer, then a first hidden layer, fully connected, a second with 128 neurons, a second fully connected layer with 64 neurons, and then the output layer with 10 neurons,
and 10 is not a surprise, it's because we have 10 categories, right? From 0 to 9, okay? So we need to figure out what digit this is. And that's all it takes, right? That's all it takes to define my network. Okay, define the layers, define how they're connected, define how many neurons are in each layer, and that's it.
So we have multiple types of networks, but this is the simplest one, and as you can see, it's only six or seven lines of code, okay? Then I bind my data to that model, the data I loaded, I just say, okay, this is what you're going to train on,
and now you train, okay? Then I'm saving the results, so I'm saving all the weights for all the layers, right, because I want to reuse afterwards, and then I use my validation set to measure the accuracy of the model, okay?
So, not a lot of code, right? So let's do this. How do I train it? Just like this. So it's going to load the data, and then it's going to run for, I think it's ten epochs, so an epoch is learning the full data set once, okay?
So here I'm taking that data set, and I'm sending it ten times into my model, okay? Batch by batch, but the full set goes ten times in a row into the network. And I can see my training accuracy going up, right?
And actually if I let it run for a little more, let's give it maybe thirty epochs, you will see it gets to one, okay?
That's that universal approximation theorem I mentioned, okay? So it's going to learn that data set perfectly. It's going to learn the training set perfectly. But then, when I take the validation set and I run it, of course I get a lower score, because these are images that the network has never seen before, okay?
So again, yeah, we'll get to one. Okay, so maybe I need thirty-two or thirty-five epochs, right? Okay, so training accuracy almost gets to one,
and then validation accuracy is ninety-seven percent, okay? And then I could use some handmade digits that you can see here, so I did them myself, and I could try and run them through the network, right? So I'm going to load each image,
and load the model that I trained, and just run it through there, okay? And do this and see what the scores are, okay? So you can see ten probabilities, right?
Because obviously we have ten categories, so they're pretty close to one, they're not perfect, but they're pretty close to one. So the first image is a zero, and the second, and all these are pretty good, and the nine is not so great, the probability is lower, but we're still okay with the fact that it's a nine, okay?
So I could have a better network, I could train for longer, I could improve everything, et cetera, et cetera, okay? But okay, that's a very simple model here. Now I want to do something more complex, right? I want to be able to do that ImageNet thing I mentioned earlier. I want to do it here.
I want to use a pre-trained network, right? Training on ImageNet takes a while. I cannot do it here. And I want to take that model, train it in the cloud, using the cloud scalability, save it, and then copy it in there and use it locally, okay?
And that's what I'm doing here. So let's go back to my robot here. So here's the model I'm talking about, it's the Inception model, it's 44 megabytes,
so it's not huge, but it's a fairly advanced model, okay? It's been trained on ImageNet, and I'm going to do pretty much the same thing like you saw here. I'm going to load the model, and I'm going to ask the robot to recognize images, right? But to make it a little more fun, actually I'm going to have the robot take a picture of objects, right,
using the camera down there, and recognizing that, okay? So it's all in Python, it's fairly easy to do, and we just have to start that server, and hopefully it still works, okay, yeah.
Yeah, the loudspeaker is on, so can that thing move or not? Okay, so just to make it a little more difficult to set up,
okay, I've got this thing here. So it's an Arduino, which is Italian, right? Or something. So it's an Arduino with a, I guess it's a PlayStation joystick connected to it. And here, this has nothing to do with deep learning, but it's pretty funny, so why not?
And it's an IoT thing, okay? So I'm using the IoT service of AWS through Wi-Fi here to send messages back and forth to the cloud, so from here to the cloud to the robot, et cetera, et cetera, okay? So I can drive that thing.
Can you see it? Yeah. So I'm making sure it's not falling off, that's why it's stopping. Okay, so let's have an object somewhere. I'll take my lucky object, the one that should work. And then, if you want, we can try something else.
Okay, I need to cheat. It's not, yeah, okay. Yeah, you know, I keep saying, it's a running, I mean, it's an old joke now, but sorry, I have to do it again.
Some people think robots are going to kill us all, but we're quite safe. This one is very friendly. It's got a Twitter page, you can follow him on Twitter. Okay, so I'm going to try something else that's probably not going to work either, but okay, fine. So have you seen this before? It's the IoT button.
You just click it, and it sends an IoT message to AWS IoT. And so this one, if it gets through, if not, I will fake it. That's right. We'll send a message to the robot asking it to take a picture. Oh yes, it is working. And telling us what it sees.
I'm 98% sure that this is a baseball. The object is 31 centimeters away. Okay, thanks. Alright, bring me your object now. Okay, yeah, I'm up for that.
So I click here, it sends an IoT message to AWS IoT in Dublin. The robot gets it, so it's back and forth to Ireland. As you can see, it's pretty fast. The robot takes a picture, the robot takes a picture with the camera,
and uses the local MXNet model to detect it. This button has been giving me trouble, so I should try something else.
Oh, come on. Okay, I can fake it, that's okay, no worries. Works only once.
Just use violence. Yeah, see, that works. I'm 69% sure that this is a water bottle. The object is 51 centimeters away.
Okay, pretty good, right? So, you know, it's all fun and everything, and of course we're going to try those and it's going to fail. Because this is a really small object, I don't know. I don't know what the distance should be, maybe here. So, once again, what happens is,
okay, there's the IoT thing going on, and there's Polly, right? The voice comes from Polly, as you can imagine. So I really need to hit it, right? Come on.
work Okay, I'll fake it. I can send the message from here as well And so it takes a picture That's my complex protocol
Come on. I'm 13% sure that this is a pole. The object is 22 centimeters away
What did you see? Oh, yeah. Oh Lighter is in there. Hey, I get five, remember? I get five categories, so I won
so Now this this is not going to work at all I'm quite sure because it has a picture and it's got you know it's if you show it a picture of something it Gets it wrong, so if you have some other objects Do you have a laptop or something laptops usually work? We can try that and then conclude before they kick me off the stage
Wanna try yeah, I don't I don't have much in here. Oh Okay, we can try this all right last one right. Oh, man. Yeah, okay fine. That's a tough one
Now that's never Gonna work
Yeah, I'm very dependent on my phone here, and it's not working great. Yeah I'm 67% sure that this is a thimble the object is 20 what?
Okay, peel bottle man. It's in there come on Hey, I still win All Right, they're going to kick me off the stage All right, so julian one Remini zero
Okay All right, so oh no. I don't need this. Thanks So I'm getting to the end. I mentioned the deep learning am I already again I went very fast because there's so much stuff. I wanted to show you today You know keep you hopefully interested
You will find all this stuff in detail on my on my medium blog so just go to medium.com Jewel Simon well, it's easy to find and and you will find all the tutorials to get started with MX net To do training etcetera how to do the raspberry pi thing
Etc etcetera, okay, so it's all it's all out there All right, there are plenty more resources There's one I want to mention I recorded on AWS podcasts a couple of weeks ago With an introduction to Mx net so just look for AWS podcasts Mx net there's only one and it's mine, so
You can listen to that and and get some additional information, okay? I want to say Grazie millet Thank you Dankeschon, Merci Gracias, and then I'll still there you know I cannot do the 24 voices. Thank you very much euro Python for having me
Thanks for listening, and if you have questions