AI on a Pi

Video thumbnail (Frame 0) Video thumbnail (Frame 4966) Video thumbnail (Frame 7420) Video thumbnail (Frame 19401) Video thumbnail (Frame 22029) Video thumbnail (Frame 25265) Video thumbnail (Frame 26992) Video thumbnail (Frame 29485) Video thumbnail (Frame 31842) Video thumbnail (Frame 35208) Video thumbnail (Frame 37787) Video thumbnail (Frame 39146) Video thumbnail (Frame 42069) Video thumbnail (Frame 43401) Video thumbnail (Frame 45338) Video thumbnail (Frame 46849) Video thumbnail (Frame 49075) Video thumbnail (Frame 50620) Video thumbnail (Frame 53126) Video thumbnail (Frame 54632) Video thumbnail (Frame 57050) Video thumbnail (Frame 58443) Video thumbnail (Frame 61650) Video thumbnail (Frame 62963) Video thumbnail (Frame 68595) Video thumbnail (Frame 70434) Video thumbnail (Frame 74051) Video thumbnail (Frame 75546) Video thumbnail (Frame 76807) Video thumbnail (Frame 80680) Video thumbnail (Frame 84625) Video thumbnail (Frame 88336) Video thumbnail (Frame 89640) Video thumbnail (Frame 91341) Video thumbnail (Frame 92683) Video thumbnail (Frame 94090) Video thumbnail (Frame 95714)
Video in TIB AV-Portal: AI on a Pi

Formal Metadata

AI on a Pi
Title of Series
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Release Date

Content Metadata

Subject Area
AI on a Pi [EuroPython 2017 - Talk - 2017-07-13 - Anfiteatro 1] [Rimini, Italy] In recent months, Artificial Intelligence has become the hottest topic in the IT industry. In this session, we’ll first explain how Deep Learning — a subset of AI — differs from traditional Machine Learning and how it can help you solve complex problems such as computer vision or natural language processing. Then, we’ll show you how to start writing Deep Learning applications in Python thanks to MXNet, a popular library for Deep Learning for both CPUs and GPUs. We'll also see how to use pre-trained models and we'll load one on a Raspberry Pi equipped with a camera. Finally, we’ll show random objects to the Pi…and listen to what it thinks the objects are, thanks to the text-to-speech capabilities of Amazon Polly
Point (geometry) Presentation of a group Demo (music) Code Multiplication sign Software developer Projective plane Expert system Food energy System call Robotics Software Right angle Office suite Acoustic shadow Library (computing)
Axiom of choice Pattern recognition Pixel Code Multiplication sign 1 (number) Set (mathematics) Function (mathematics) Neuroinformatik Coefficient of determination Machine learning Endliche Modelltheorie Exception handling Predictability Machine learning Gradient Electronic mailing list Arithmetic mean Computer science Website output Right angle Arithmetic progression Task (computing) Resultant Spacetime Service (economics) Open source Link (knot theory) Observational study Artificial neural network Virtual machine Similarity (geometry) Login Graph coloring Theory Number Representation (politics) Theorem Lie group Computer-assisted translation Task (computing) Form (programming) MIDI Dialect Information Artificial neural network Weight Paradox Plastikkarte Computer network Line (geometry) Cartesian coordinate system Approximation Visualization (computer graphics) Software Blog Universe (mathematics) Video game Natural language Library (computing)
Pattern recognition MIDI Scaling (geometry) Texture mapping Real number Computer-generated imagery Artificial neural network Digital signal Set (mathematics) Computer network Cartesian coordinate system Power (physics) Wave packet Machine learning Telecommunication Duality (mathematics) Video game Right angle Game theory Pixel
Scale (map) Pattern recognition Multiplication sign Computer-generated imagery Visual system Data storage device Mass Cartesian coordinate system Scalability Power (physics) Category of being Medical imaging Googol Point cloud Elasticity (physics) Right angle Game theory Endliche Modelltheorie Object (grammar)
Pattern recognition Scale (map) Existential quantification Artificial neural network Real number Visual system Expert system Solid geometry Line (geometry) Neuroinformatik Number Medical imaging Category of being Coefficient of determination Bit rate Different (Kate Ryan album) Normal (geometry) Right angle Game theory Error message Resultant Row (database) Scalable Coherent Interface
Scale (map) Pattern recognition Multiplication sign Visual system Virtual machine Mereology Cartesian coordinate system Number Neuroinformatik Medical imaging Right angle Endliche Modelltheorie Condition number
Point (geometry) Pattern recognition Pixel Texture mapping View (database) Multiplication sign Artificial neural network Visual system Set (mathematics) Function (mathematics) Wave packet Neuroinformatik Number Medical imaging Connected space MiniDisc Computer architecture Scale (map) Addition Information Parallel port Computer network Maxima and minima Connected space Category of being Software output Right angle
Web page Multiplication sign Product (business) Inference Goodness of fit Machine learning Internetworking Robotics Different (Kate Ryan album) Videoconferencing MiniDisc YouTube Information Logistic distribution Content (media) Bit Product (business) Process (computing) Order (biology) Website Video game Speech synthesis Right angle Natural language Family
Pattern recognition Dataflow Intel Polygon mesh Service (economics) Software developer Mobile Web Virtual machine Set (mathematics) Real-time operating system Bookmark (World Wide Web) Wave packet Machine vision Sic Degree (graph theory) Machine learning Befehlsprozessor Selectivity (electronic) Speech synthesis Graphics processing unit Pattern recognition Software developer Polygon Expert system Instance (computer science) Cartesian coordinate system System call Tensor Computer cluster Speech synthesis Right angle Natural language Force Library (computing)
Pattern recognition Pairwise comparison Mobile app Pattern recognition Service (economics) INTEGRAL Natural language Chatterbot Interface (computing) Computer-generated imagery Demo (music) Mathematical analysis Facebook Web application Medical imaging Degree (graph theory) Right angle Object (grammar) Category of being Computing platform Speech synthesis
Pattern recognition Multiplication sign Polygon
Point (geometry) Scaling (geometry) System call
Laptop Default (computer science) Medical imaging Pattern recognition Robotics File format Expert system Real-time operating system Natural language Cartesian coordinate system System call
Medical imaging Focus (optics) Pattern recognition Computer configuration Set (mathematics) System call Local ring
Medical imaging Pattern recognition Information Confidence interval Different (Kate Ryan album) Gender Range (statistics) Software testing Right angle Limit (category theory) Descriptive statistics Number
Inclusion map Addition Touchscreen Information
Complex (psychology) Group action Personal digital assistant Phase transition Right angle Function (mathematics) Product (business) 2 (number)
Medical imaging Pattern recognition Service (economics) Confidence interval Source code Maxima and minima Right angle
Context awareness Mobile app Pattern recognition Information Software developer Multiplication sign Artificial intelligence Cartesian coordinate system Connected space Software Robotics Point cloud Right angle Traffic reporting
Robotics Multiplication sign Right angle Event horizon
Server (computing) Presentation of a group Context awareness Code Multiplication sign Wave packet Neuroinformatik Programmer (hardware) Medical imaging Internetworking Robotics Different (Kate Ryan album) Semiconductor memory Operator (mathematics) Endliche Modelltheorie Multiplication Physical system Multiplication Electronic data interchange Scaling (geometry) Projective plane Computer program Commutator Instance (computer science) Cartesian coordinate system Limit (category theory) Open set Connected space Word Point cloud Right angle Natural language Writing
Multiplication Server (computing) Scaling (geometry) Service (economics) Demo (music) Key (cryptography) Code Scaling (geometry) Projective plane Virtual machine Set (mathematics) Maxima and minima Line (geometry) Scalability Wave packet Software framework Ideal (ethics) Multiplication Library (computing)
Point (geometry) Dataflow Pattern recognition Code Digitizing Demo (music) Set (mathematics) Device driver Instance (computer science) Line (geometry) Wave packet Medical imaging Energy level Software framework Right angle Endliche Modelltheorie Game theory Series (mathematics) Extension (kinesiology) Resultant
Inclusion map Medical imaging Category of being Validity (statistics) Software Digitale Videotechnik Virtual machine Maxima and minima output Endliche Modelltheorie Function (mathematics) Wave packet
Multiplication Dialect Validity (statistics) Code Weight Division (mathematics) Line (geometry) Student's t-test Wave packet Type theory Word Software Network topology Computer hardware Right angle Maize Endliche Modelltheorie Resultant
Stapeldatei Validity (statistics) Multiplication sign Digitizing Physical law Set (mathematics) Approximation Wave packet Category of being Medical imaging Goodness of fit Software Universe (mathematics) Cuboid Moving average Theorem Endliche Modelltheorie
Server (computing) Joystick Service (economics) Demo (music) Scalability Wave packet Inclusion map Medical imaging Message passing Software Robotics Object (grammar) Point cloud Right angle Endliche Modelltheorie Object (grammar)
Web page Word Message passing Observational study Robotics Token ring Selectivity (electronic) Right angle Object (grammar) Endliche Modelltheorie Twitter
Word Right angle Water vapor Object (grammar) Distance
Point (geometry) Inclusion map Category of being Message passing Object (grammar) Mereology Communications protocol
Laptop Duality (mathematics) Object (grammar) Endliche Modelltheorie
State of matter Object (grammar) Blog Demo (music) Right angle Coma Berenices Exception handling
Data mining Information Lie group Electronic mailing list
can you hear me fine yes so like it OK well well it's been a long day for you guys it's been a long before me that I find very early this morning from from Paris
but it's good to be here I hope have some energy left it's a it's a longer session and so we will have plenty of time and I suggest if you have any questions during the session please ask your questions please raise your hand and we have a microphone somewhere and that you can grab and ask your question OK I'd rather have a more interactive session than just wait until the end and answer questions on stuff I was talking about 34 minutes ago OK so please raise your hand and asshole you questions so my name is uh Juliana or Julian or Giuliano or whatever you want to know what to use that's fine Ametek evangelist of for a WS based in the Paris office some sometimes because most of the time I'm really traveling in and and talking to developers and like today and that's that's fine and I've been with is for about a year and a half and before that I I was uh I've spent the last 10 years or so as a CQ and VP Engineering in which started in Paris OK so indeed today we're going to talk about artificial intelligence and so on I'm going to take you on that our long journey that started in in the fifties and if everything goes well I will end up running some stuff on my Raspberry Pi robot which is the waiting in the shadow of to exterminate of so maybe you knows uh and then we'll see what we can do this again so I'll start with a very quick introduction of AI and and why it's been mostly frustrating so far and then we're going to talk about and there's only I how what what do we do at Amazon and EWS and recently fairly recently we've had some some new stuff coming out and then I'm going to talk about and that that most of the presentation actually will focus on on an Apache project call and and who has already heard of an expected well how can I'm progressing usually it's 0 people which is why I'm talking about uh so an X that is a is a deep learning library which is extremely developer-friendly it's really designed for quick experimentation by developers in the by non experts right so I always say you have to have a a PhD to use and extend and that's fine that's exactly what we want to do as so we'll talk about a except for a minute you know the high-level features and so on and then of course will go into some demos and using Python code code was surprised right I can show you some other could if you want but you us for stuff at the right of it has to be by from to that at and then of course so I'll point you to some more tools and for more resources to get you started so the
story so far well that's it right so who has no idea what this is it's OK if you raise your hand at so you make thank you for making making me feel very old I and I keep thinking some people in the room over and then me that as time goes by this becomes false and you know all the time so this obviously is from the stanley kubrick movie 2001 a space odyssey if you haven't seen it you have to see this it's a masterpiece and it's the 1st visual representation of artificial intelligence it came out in 1968 and 1969 and this computer are is inside a spaceship and it's really running the ship and the astronauts you you you even wanna why you of astronauts in there which is probably why the computer will decides to kill them eventually so most of you know the end but you should still see the movie and n I guess a lot of geeks like me and computer scientists and researchers have enough tests with with this are in a 1 when we 1st saw the movie this is what we're trying to build the right this is the ultimate artificial intelligence that can understand natural language that can speak so that can handle very different and very complex tasks in know imagine driving a spaceship and how complicated is that so this is basically what people have been trying to conceive and and yields over the years and well have this succeeded no
and actually in 2001 the real year 2001 um a famous computer scientists call Marvin Minsky is that 1 of the fathers of artificial intelligence founded the uh the AI Lab at the MIT and the play about things and it actually was an adviser to Kubrick on the movie right so we work with Kubrick and 68 to design what how would look like right so pretty funny and so in 2001 my Minsky wrote a paper said it's 2001 where is hell but in well obviously nowhere in it he gave a number of reasons why artificial intelligence had not made a lot of progress in the 40 or 50 years right and if thought we were very still very very far away from having out and I think it's a paradox because it in the mid 2 thousands and machine learning started to explode right and now today everybody on who's doing machine learning in the room the fact that thank you everybody bread or at least everybody has machine learning on there is you may in let's 1st right I have uh so it's a commodity it's easy to do machine learning how you can do you can get some open source libraries you can grab in another site it's uh tools by phone and build a machine learning models in just a few lines of code you can use cloud-based services you can use the known you have a wide choice so you know why did Machine Learning become so successful for starting a new 2000 uh until you know 2010 and now even more so why did this make a lot of progress and why did AI not make a lot of progress in the same years right the well it's because as you know in machine learning it's all about the features right so you it's fairly easy to build a prediction model provided that you have clear features and actually most of the work of data science is to find what features are useful in the dataset and how to engineer them how to prepare them so that they can deliver a nice efficient working model OK so let's take an example if you have a web log of the Apache log or something similar and you want to use those logs to predict user activity you know it could be predict so what's a link there went to pRecall what adjective and click on except except for a typical activities then the all the features are pretty much available right you just look at what the Lord has and you know the time and date and then URL and user regions and and blah blah blah blah I'm probably 50 more and you just have to figure out which ones user ID of sleep and you have to figure out which ones are the ones that are the most relevant at for your machine learning models OK and you go and tweak them and combine them and twist them into form until you have a working model now let's take a different problem OK suppose I take a picture of this room you know thousand pixels by thousand pixels and I wanna know who's in the room all what's in the room or is that even a room as I wanna know what the picture is OK so it's a million pixels and if it's a color picture it's likely that it's actually 3 million pixels right red green and blue so what does this mean I have 3 million features tonight take those 3 million features no for flatten them men and and send them into a prediction model would that work probably not right I'm sure some some people have tried but you know that's already working and if you think about it doesn't even make sense right it is every single pixel in this picture are useful information right look at your list that the seashore sitting on a mean it's all gradients all the same color so do I need all those all individual pixels to figure out this is a seat and it's great again probably not right common sense tells me no I don't but it but that's the difficulty in building those smart applications right common sense human common sense tells us we answer immediately if we brought a 5 year old kid fragile kid in this room and ask OK what you see is said well I see people sitting in a room so it looks like a classroom the right and if you show the animal pictures to that and say OK is that it is that a cat is that a tiger is this a dog or you would know instantly OK but if you ask that OK How do you know it's so how do you know it's a lie and how you know this is a cat right then it becomes more complicated again and even give you some answers right but how do you feel that into data that a computer can understand and that's end number 1 problem with deep learning right and this is the problem that deep exactly trying to solve it's trying to solve the faint to teach computers to understand it informal things right things that you and me no pretty much you know from a 4 year olds uh from being four-year-old and an an older but it's impossible to teach pure to this I guess a machine later tried to the machine learning way a or at that because there are too many features there's just too much information and you cannot feed all this information to machine learning model and get a decent result so of course the answers to this is no networks and they're not new at all have been around for indicates theory work even goes back to the late forties but that the the the 1st major of applications came in the fifties right so it's literally 60 year old technology right 60 years old and what is a neural network actually well people have written books about this extend the life explaining this so I'll keep it shorter a simple basically a neural network is a universal approximation machine it's the name of a theorem that says that if the network is large enough and if you give it enough data it's gonna in anything perfectly right the that's it so it's a running machine it's a learning machine you design it you show some data on over and over and over again and it learns perfectly how to predict a given output from a set of given inputs it can predict absolutely anything and you know magically you have to understand exactly what happens in there which is nice but OK mathematically the greater the great theoretically they're great for limited applications they were great but FIL until very recently that and really work right they do not really work and if some of you the older ones like me and in the university let's say in the nineties and you study probably you added you know a few hours on the world networks and AI I'm sure you teacher told something like well yes OK artificial intelligence is really cool neural nets are really cool you can do all kinds of crazy stuff on paper but you know
right outside of the lab they're pretty much useless because we cannot solve bigger
problems with those so it's all about scale scale scale so that's it that's the reason why in work that's the reason why in the sixties and seventies and beyond just stayed in the lab and they were cool stuff to play with but there were no industry applications no real life applications of these because data was just not available OK and remember I said you need lots of data to train and computing power was not available e the right so just look but changed right it's changed changed for 3 reasons the first one is data sets are everywhere right digital data is literally everywhere OK old dual text of pictures everything and some public datasets are available on the Internet large once you can grab and you can mind and you can go on Kaggle begin to count on Machine Learning competitions deep competition so data is just everywhere to everybody out and grab data and start training and start building up apps the 2nd thing is computing power is on well it's always a problem isn't that occur in it's less of a problem than it used to be right because now we have to use and in the mid 2 thousands and an electron our research teams found found that actually GP use could be used for something smarter than playing 3 D games right so instead of using all that fantastic power to build 3 D games and should each other
well we can't you know do some actual scientific work and In my guess that's a good thing
and so now GP are everywhere and they're furry chief
and ended in lever massive amounts of computing power and the 3rd thing that helped our deep growing explode is on the elasticity and the skill of the scalability provided by clouds OK because it just like everything else so why would you buy 50 years fancy GP used to train for a know a couple of hours a week and have them do nothing the rest of the time when you can just go to the cloud and drive a few GP use for a few hours training a model and pay for those few hours and released and right so the elasticity the pay as you go know everything that you know are for of people cloud and also applies the right compute storage etc. it's all there grab it and use it really sick and pay exactly what you have to pay and nothing else so deep running exploded right that let's look at a concrete application and every year there's a competition goal of I I a less a B or C of Hmong know our research teams across the world and what they do is they take that in image in a dataset which is a very large datasets and of or hold which is composed of the image is the with a single thing in them so it's either animals or objects or plants but no Youmans in their thousands of categories and they have to predict the right category for each image right that's the game to actually be can predict 5 categories for each image and if the correct category is in the top 5 then it would be to consider a win OK and so they've done this for years and years and years power here's an example these are
real images from the dataset and so who thinks these dogs or so it's not the same dog but all they are from the same breed my who thinks they're not the same breed do it do any Eskimos or Norwegians in the room right now wages are usually pretty good at this at the at and who has no idea what personally I've no idea and I think if you gave me 15 minutes I would still have no idea right some something's tells me something still meets the same something solid status right so for the record it's not the same breed OK but How would you know right how would you know unless you really your real dog expert and you could actually explain to me that all see the difference here and urea OK and then I show you different breed and you're not an expert of those dogs and you don't know right so that's that informal knowledge we need to fit into the computers so
they've been playing this game with dogs and plenty over other categories for years and these are the results of this started in 2010 and the blue line is and the blue line is the error rate OK so it goes from 28 to 25 to 16 to 11 down to 3 per cent last year OK only 3 per cent error and the red bar is how deep how many layers of the neural network that 1 was OK so In the 1st couple of years it was just 1 layer and then it went up as you can see 18 19 and 20 to up to that crazy number of 269 there's OK so now the question is what you think this the the scorer would be for Youmans right out normal Newman's if I gave you the ImageNet dataset right then and lots of coffee and asked I asked you to score by its its its millions of images so you take a while but OK theoretically you could do it with lots of coffee what would be your average error but I have no idea what I guess our brains of much more than 269 layers OK so but still are brain is different so what would be the what would be the score who says um less than 5 for less than 3 who thinks humans actually beat
the machine no 1 OK who says more than 10 % again in between
5 and 10 right OK so so the
answer is actually 5 4 5 . 1 4 to be exact OK but again it's
theoretical because if I give you maybe 50 images you would do this if I gave you a thousand images maybe not so much if I give you a million images you would never get to the end right so the computer can do it faster longer is never tired and you know it will give the same answer all the time so what this means is actually deep running models and computers are now better at recognizing stuff than us right with the lab what would I would say at given the condition that they actually have been trained on it of course if you show them something they've never seen right they won't know and maybe we will because we're smarter right but still I think it's an impressive number and I'm sure it will keep going down so this is just 1 example there are many more applications of deep learning and we will see a few more as we go but now let's try to talk about how we how you can actually if you question what does it mean layer and the top part in the top right corner
so a layer is a set of neurons there are connecting to connected to the previous layer and the next layer and and they will work in parallel to do some computation right so at the minimum you 11 input layer they which will be your input data so let's my pixels right are so let's go back to my million-pixel example so I would have 1 million of input neurons can each of them with the input so with the peaks with 1 pixel value and the output layer would be probably down on the number of categories I have so let's have a thousand different categories OK so I whatever thousands of neurons in the output layer and I would just won't want to be activated for a given image right and in the middle I've got uh what we call hidden layers which are just an additional layer of neurons that do their magic right that just extract features will see some examples extract features from the input layer and gradually butter and how to act activates the correct output neurons for a given input right you have 1 and so here there different structures and in this example here's is what we call a fully connected a network so each new wrong is connected to all the inputs and all the outputs of the previous and next layer OK so but there are different architectures right and now you can you start understand why it's so the informer computation point of view right because it's you know it's N A 1 times 10 to times and 3 times so it's it's a lot of connections and each of them has to be optimized and computed yet that training is Wilson will do some large training on on a smaller data set right you'll see and use
OK so now let's talk about what we do
Amazon so actually Amazon has been doing a lot of AI for i wanna say forever it's not quite true but feels right and another was certain in 95 and as a bookshop as you go and if you go to the internet and look for screenshots of the early web site very early on you have I recommendation right and then you had per content personalization etc. so very early on and In a bit they felt they had to have that smart to feeding into the website that custom experience to the website and then as time went by and Amazon started to use of AI for there are 4 what we call the fulfillment centers so where the where the goods are actually stored and where the ship from you may have seen at those videos on YouTube although the robots that pick up the shelves and and move the shelves and 2 to the humans so that they can take the objects and prepare your order right if you haven't seen this you should take a look look for a unmuzzled robotics on on you tube and today we have more than 40 thousand robots life every single minute in all of our fulfillment centers just moving around and autonomously and moving stuff happens so that you know we can all get orders and I'm and of course there's tons of AI and machine learning on the site on if all of us went to the same web page on amazon we would not to the same thing for sure right through 2 different products different layouts different everything active and I'm sure you've seen this
but although I don't think it's available in Italy um but not to a little in the UK in Germany and in the US whole fleece and not inference of hopefully soon and so the Amazon family of of devices of the personal assistance and and you can just talk to them and order a taxi order a pizza ask for the news ask for weather information of every single day we have new of and new skills as we call them that come out for a for the year could devices and in its sites and it evidently all based on deep running technology natural language processing and text to speech etc. right so that's a consumer product but so too the visible side of all the work that I Muslim is doing on a but there's a
of course were developers so we want to build stuff and there's a there's a full stack of of AI and machine learning solutions and services that are available in AWS right starting from of course the infrastructure of the the see the instances so obviously we have CPU instances we also GPU Instances I'll show you well I'll show you 1 in a minute with some training and on top of this I We could run on your favorite deep learning libraries so to them when to use an that's but you could use a tensor flow or care or or something else and then if you want to actually go deeper and and really do so in build your custom algorithms a new custom applications you could use of or EMR service which is the basically a managed service for the Hadoop ecosystem introduce Barkan and and all the other of Hadoop friends and you could do well and you could do on as a machine learning etc etc. so we have a full set of services that are that'll you know you to build smarter applications but you need to be an expert right and although we all have machine learning and then soon the broiling or easy ways right not everyone is an expert OK so we thought in then or customers asked us to do that that might be interesting to build some higher-level services which are just an API call away and and and very simple to use and yet able to do very complex things and these are the 3 services that you see on top of lex poly and recognition and wanted to talk about this for a minute and then show that to you can so the 1st 1
is calling for a well it's easiest to explain police text-to-speech right so it's just 1 API call select language select voice in the and you get the sound fine in real time with human sounding voice so today we have fought the 24 languages including Italian so we can try that and 48 different voices can and will keep adding more the next services like so
let's is the chapter what service so you can design of a conversational interface using text using again voice and and integrate that's in diverse platform with your webapp for your mobile app or on Facebook or external channels and so pretty cool service securing to chat bots and the last
1 is recognition so recognition is image recognition 2nd object detection and face detection face comparison acceptor etc. and as you can imagine all these are obviously based on but all you have to do here is just call an API right so let's give it a try but the OK so of
course we could play in the consul here and you know recognition and poly 0 yeah we want to try
the Italian um could could the microseconds the but very a so that might be a small OK so here are all the voices that support right and so for Italian left to voices with colony George very Italian names so that's right it's hopefully have some only the time it's coming away from the gym I know
our and it's going to do this the PPG
that I but literally don't tell the general that the jail was yes it does the PDG that I I think that's Collins those of self care
but what I don't want to use textbook each day at and then you can go tell this to you and if you want to you have anyone from Iceland in the room the but what about the but it's yeah hated told get sick pop up the text for scale that we know where where we could do this all the time but from the point thank you OK so as you can see this is really just um this is just an API call so I could do this and I
could do this on my on my laptop tool would do it on the robot afterwards again I can just show you yeah and so this is local here um and that's OK so that's all the basic I wanna show you recognition so that's poly right 24 voices and the 24 languages 48 voices uh and um and it's extremely fast so you can you play the sound fine have an attractive thing going all you can save it and use it to your applications and so very very easy to use and you want to see what the API looks like after all let's do this all this is a really right that's all there is to it you select a voice and and of the format which is M P 3 year by default and the text so that you want to generate and that's it so widely call away you get in real time they human-sounding voice OK and then either you play it or you were some of you there you play or usage for for further you so that's to 1 EPI call you have to be a deep running expert to this OK let's take a look at recognition of the so let's take my favorite image the should I show it to you here of course like this so it's October 1st
but I'm sure we have of Rimini Festen Bourgogne 1st we have the same thing all over the world right the OK so that's my picture it's a fairly complex picture so now let's a call
recognition on that picture and see what it's on local sorry I need to copy the image to 3 or so that's my bucket what could we have the sound on the Aegean minor maybe not with the yeah you and option click on this on of all all my only to in my Mac settings the the focus of learned
something else In this lecture and
thinking what he did it says it's still not so bad test if you can construct this didn't I would have had no OK so
certainly to the other way yes and the decade or so In this talk of the text was
something that is not a number of different but you can hear it right or not to undermine him but my OK so let's say let's send that image to recognition on the weapons again so pretty immediately I see some labels and confidence the faces have been detected here are some keywords about this picture people person human alcohol beverage drink crown female the world OK so I would say that's a very accurate description of that picture right and and so like I said we get labels that confidence scores ends then we find 15 faces which is the maximum number we can find it's it's a predetermined limit right we stopped it at 15 and for each of them we get some information like gender and age range emotion detection and there's
additional information on um uh where where the nose is and where the eyes are etc. but I didn't tell so if I show you my screen and you know
highlights the faces that has been found that have been found as you can see we see 15 faces right and we could say that OK face to here that lady here
is when she OK here of his female the looks pretty happy and all she looks to be closer to 24 and 14 but the OK In
prettier precision 0 by the way never do this with your girlfriend OK never take a picture of your girlfriend and users never priority wise never right you your mother might forgive you you go from will not thank you trust me array OK let's forgive from 1 just for a 2nd
and but it's not that I'm not showing you this picture from no at the end a single face has been detected here are some keywords about this picture city downtown metropolis occasionally so that's quality that you hear how I'm extracting some outputs and sending them to point OK so let's see how fast you can find a face because there is a face and you ready the so it's a larger product
we usually with a smaller group you know I like the last people to raise their hand when this is the face and the thanks of so but you know for some people it takes a few seconds right so because I it's way over there and it's it's hidden in is very complex picture and it's interesting to see that the the car face is not picked up because it's not a phase because no 1 has had eyes half the size of their face at least not where I come from or or maybe after a very very long evening abusing substances the into the general case that's another human face right OK so this is
recognition OK pretty cool
but and then you will find that all all the all the codon and everything again it's on
some data I can just show you the the Ricoh API itself it's super easy as well it OK so this is the want to detect faces right so again single EPI are you literally copy the image history and and . recognition to it comparing face needs a source image and the destination image um but as you
can see you notes and detecting in image again is as easy as this right where the images how many labels you want what's the minimum confidence score that you want to to report it so pretty smart our services is but if you if
you're a very bad Tyson developer like me you can use them in minutes right do not have to be an expert OK but there's a problem with
this the and yet we have tons of customers or let's let me mention a few of the the Washington Post he's using poly in their mobile app to to read articles right so you can you have to look at you fall like this you can just you know trick play on and let our let the the Washington Apr read the article to you right and you can actually focus on what's outside such nice and and you know capital 1 is 1 of the top 10 banks in the US very large bank and the other elects application and for people to use to have information on their banking details right so you can just instead of going to the that website and looking at the you know the the the detailed report that we never quite understand you can just use a child but it's OK or how much they spend on the restaurants last month right that that kind of thing so that's very very cool so again like I said there is a problem what I showed you right well In the you know it's in the context of the of devices and robots except for the problem is that we need the cloud right we need a plant connections in the network connectivity and sure I could use recognition and and everything on my on my robot aware so maybe it's time to bring the role of no so we can because my friend the and so it's a it's a raspberry pi robot so so sure I can I can
connect to that little guy in the I
could use our I can use fall into 2 tons of silly things like this close friends thank you for visiting us today I hope you have a great time now Julian could you please stop clowning around and get on with the talk right thank you and thank you to 4 to our friends from the crushing the robot this time yeah it's pretty it's a pretty regular event I keep fixing the
notes OK yeah so on so yeah I I know it's it's a raspberry pi it as a wife he I can connect to the Internet I can use anything I want with
it OK it's it's a Linux system but can we expect all the robots of the world and all the devices of the world to the always on always connected to the to the clout
probably not right it's it's an unsafe assumption what tonomous cars and stuff right you going the tunnel and then what happens so we need something different write something that could work and not the cloud-based and and this is 1 that talk about for a for the rest of the presentation itself we can deal deep running applications I'm using an extent and and better than on devices like this which are not powerful all right Bisazza one-gigahertz of clock speed and what gig of memory so it's a very very small device if you compare it to a typical computer or server right and and we're going to do local AI on this little fellow here without any cloud connection so a few words about an etc. so an X that is for programmers right it's it's really like it's it's developer-friendly it supports multiple languages and like of course Python C + + JavaScript's Matlab and Julia and I'm sure I'm forgetting something here that it's an Apache project so it's open it's not controlled by any company and AWS has committed to supporting this project because we think it's a it's the most appropriate and I will explain 1 in a minute and both for a cloud-based application in and for us more devices and in the end I think the top 10 annex that commuters we have of 4 people for 5 people working for edible and it's high performance and you know as you will see even on a small device like this it runs very fast and it doesn't require gigabytes and the rights of memory right and so like I said we we endorsed it and because for all those reasons and also because in in a wider context it scales very well right so why is getting important is not really important at this stage scaling using Paul is important when you train the model right so when you actually take those million of images or those terabytes of sound bites sexual and you train the model K these these this is where the operations of the heaviest and so you want to be able to use as many GPU as you can to speed up training OK and annex that can do this very easily uh and uh in in the code and it can do very efficiently and when it's running on hesitancy here up to 16 GP used with almost linear scaling which means if you train on 16 GP use it's pretty much 16 times faster than when you train on 1 GPU right almost perfectly narrative and it goes beyond this if you were sick y 16 because 16 is the largest GPU instance that we have right and the largest in so that we have a 16 GP use so in 1 server that's the limit but we can have multiple instance season and we can go up to 2 156 GPU so 16 instances with 16 chief using them and again as we
scale on multiple training on multiple servers we see almost linear scalability again
and and that's something you will not see another frameworks right most other frameworks can either not do GPU well all all they can do maybe 1 GPU or maybe maybe a few tweaks your cold like a maniac you can get it to run on multiple genes using the same machine and that's just 1 line of code when you do that in the that and then training on multiple hosts then it becomes this becomes a real project if you want to do this with other libraries for Annex that it's almost as simple as sharing ssh keys across the notes was so that they can connect to 1 another and that's about it for the data set would be split automatically and so it's really nice so that's 1 of the reasons also why we like uh our why we like an X that it's because musical ability is very important for our customers 4 and thus it's very important for us and we want to make sure we build services that scale to the max right so let's do some demos so let's
start with something simple let's do some training for a 2nd um so you're going to use the GPU instance to train an image recognition model on a data set which school and this and I guess most of you or some of you have seen this before and this creates it's very popular but it says 70 thousand handwritten digits from 0 to 9 and of course the goal is to show an image and get the get the proper result at the end of the game and so let's do this and you can see where's my
instance it OK so here I'm running on on the smaller GPU Instances it has only 1 GPU but that's more than enough for what I need and I'm running an Amazon Machine Image which school the deep running in my which is built by us and you can use it for you know at no cost and the cool thing was this is that it comes pre-installed with everything so whatever framework to use you know a cafe and extent flow about light thing else it's already in there so you can just put up your GPU instance with this image and everything is ready for you to work you don't need to go and install the couldn't drivers and the invidious stuff which is a little tricky to OK and so here it that's OK I wrote of a mixed met in this fight the so I designed a very simple model the right right but so it's OK to 30 lines of code right to do a thing so it's like when I states developer-friendly it really is it's very high level so you have to go in in coming back to questions earlier you have to go into the details of of every single new wrong and just defined there's connect them and that's it again that's what I'm doing here so I I've got a series of love articles on this with every single detail explained so I'm gonna go a little faster here because I want to get to the point and basically here and
you are you just global data set right so you know those images and there's a training set that we use for training and there's a validation set that we used to evaluate the quality of the model just like we do in machine learning OK this is the medical
definition OK so and in an input layer and then 1st hidden layer is fully
connected the a 2nd hidden of with the 128 neurons a 2nd fully-connected layer with 64 neurons and then the output layer with 10 neurons and 10 is not a surprise because we have a 10 categories right from 0 to 9 OK so we need to figure out what did misses and that's all it takes right that's all it takes to define my network 80 final there's dif define how they're connected to define how many networks are many neurons are in each layer and
that's it so we have multiple types of networks of but this is and this is the simplest 1 and as you can see it's only 6 or 7 lines of code OK then I find my data to the model in the title did I just say OK this is what you going to train on and now you train OK but then I'm seeing the results some saying all the weights for all there's like because I wanna regions of 2 words and then I use
my validation set to measure the accuracy of the tree of the model OK so far not a lot of code right
so let's students hardware training just like this
so it's going to load the data and then is gonna run for I think it's 10 it box so an airport is rerunning the full dataset once OK so here I'm taking the dataset and I'm sending it 10 times in 2 into my mom the batch by batch but the full set goes 10 times you know roll into the network and I can see my uh I can see my training accuracy going up right and actually fire
if I let it run for a little more that's it maybe 30 books he was gets to 1 OK that's the universal approximation theorem I mentioned OK so it's gonna it's gonna on that dataset perfectly it's gone around the training set perfectly but then when I take the validation set and I run it of course I get a lower score because these are images that the network has never seen before OK so again you know we get to 1 Prof OK so maybe only 34 32 of 25 books right OK so training accuracy almost gets to 1 and then validation accuracy is 97 % good and then I could use I could use some some handmade digits that you can see here so I did them myself and I could try and run them from the network so I'm going to load each image and and and load the model I trained and just running through their OK and do this and see what the scores are the OK so while so you see 10 probabilities right because obviously we have 10 categories so they're pretty close to 1 they're not perfect but they're pretty close to 1 so the 1st image is as you roll and the 2nd and all these are pretty good and while the 9 is not so great that probabilities law but we're still OK with the fact that it's again so I could have a better network could train for longer I could improve everything except or except OK
but OK that's just that's just a very simple model here an now I want to something more complex right I want to be able to do that you mention it I I think I mentioned earlier
I want to do here I want to use a pre-trained network right training on image it takes a while cannot delete here and and I want to take that model say no training in the cloud for using the cloud scalability save it and then copy it in there and use locally I think and that's what I'm
doing here so let's go back to let's
go back to my over here so here's the model I'm talking about it's the inception model it's 44 megabytes of 1 huge but it's it's a fairly it's a fairly advanced model OK it's been trained on image next and I'm going to do pretty much the same thing that like yourself here I'm going to build the model and and gonna ask the robot to recognize images right but to make it more fun actually and then have the robot take a picture of of objects right using the camera them there and recognizing that OK so it's all in Python it's very easy to do and we just have to start the server and hopefully the works the OK uh yeah the loudspeakers on so can the thing will work out 1 OK so just to make it a little more hidden difficult to set up like have that this thing here so it's not we know which is Italian right but or something so it's an Arduino I with the them I guess it's a playstation joystick connected to a and here this has nothing to do for to do with deep learning where it's pretty funny so why not the end of the week and I mean it's an IOT thing OK on using the LUT service of the W S 2 through Wi-Fi here to send messages back and forth to the cloud also from from here to the cloud to the robot acceptor etc. can I can drive that thing so and making sure it's not falling
off to its stopping OK so let's let's have an object somewhere like I'll take my lucky object the 1 that should work and then if you want we can try something else but cheat the it's not it on it yeah I know I can say it's it's a running I mean it's an old joke now that's sorry to the end some people think robots are going to kill us all that would work quite safe where this 1 is very friendly is gonna Twitter page you can fall in love with OK as along a try something of from not gonna working there that OK fine so I have you seen this before and it's the LUT button you just click it and it sends an IOT message at 2 terminalis IOT and so this 1 if it gets through if not I will think it and it's right will send a message to to the robot asking you to take a picture with it is working the and telling it's what it sees the I'm 98 per cent sure that this is a baseball are the study warned centimeters of word tokens that hit I bring their objects no OK the you and yeah and I'm I'm not for that OK so I click here sends an I message to it was I IUT in Dublin the robot gets it so it's back and forth to I want i is concedes that pretty fast the robot takes a picture thank the robot takes a picture with the camera and uses the local and accept model to there to detected this button has been giving me trouble selection try something else welcome on think I can think of something worse the works only once what
just use violence of yet see by
words I'm 69 % sure that this is a water bottle the object is 51 centimeters away OK pretty good right so so it it's all fun everything of course we're going to try those and it's going to fail on because this is a really small objects and a lower the distance should very here so once again what happens is OK there's the IOT thing going on
but and there's Pauli right the voice comes from poly-A as you can understand status can imagine so on the need to hit it come work but I don't think I can send i
can send a message from here as well and so on and so that it takes a picture parts but make complex protocol the now on nights point that is come
on but you don't I'm 13 % sure that this is a polar the object is 22 centimeters away when he sees of only all lighters
in there a I get I get 5 remember I get 5 categories I 1 and so now this this is not gonna work out all I'm quite sure because as a
picture and it's that know it's it if you show a picture of something it and gets it wrong they show some other objects that could you have a laptop or something like that really work you can try that and then conclude before they keep me off the stage you with that that it was try no I don't
have much of year old I think introduce are at last 1 that the screw all on the OK fine that's a telephone to but now that's never going
forward but but but but but come on the yeah and very dependent on my phone here and it's not working right here it I'm 67 % sure that this is a thimble the object is 20 want OK peel model and it's in their come
on and I still when but but but alright the ability of the of the states of a alright so Julian 1 0 OK alright so on
on only this makes so I'm getting to the end I mentioned the deep running in AMI already again I went very fast because there's so much stuff I want to show you today and you keep you up hopefully interested and you will find all this stuff in detail on my on my medium of blog Onsager's go to medium . com uh June Simon all it's easy to find and and you will find out all the tutorials to get started with an X that that 2 training except for how to do the Raspberry Pi thing and I accept right such OK so it's all it's all of the right right there are plenty more resources
there is 1 I wanna mention I recorded on a WS podcast and of a couple of weeks ago with an introduction to annex that so just look for in the list podcasts and next that there's only 1 and it's mine so you can listen to that and and and get some additional information that
it follows a got stimuli I thank you non-Christian messy a Garcia us and that also there you know I I I cannot do the 24 hours is thank you very much so your Python for writing me thanks for listening and if you wish to thank a few