Deep Learning with Python & TensorFlow

Video thumbnail (Frame 0) Video thumbnail (Frame 1274) Video thumbnail (Frame 5136) Video thumbnail (Frame 7060) Video thumbnail (Frame 9011) Video thumbnail (Frame 16139) Video thumbnail (Frame 17889) Video thumbnail (Frame 19094) Video thumbnail (Frame 20905) Video thumbnail (Frame 22670) Video thumbnail (Frame 23819) Video thumbnail (Frame 25005) Video thumbnail (Frame 26324) Video thumbnail (Frame 28959) Video thumbnail (Frame 31122) Video thumbnail (Frame 32627) Video thumbnail (Frame 34374) Video thumbnail (Frame 35611) Video thumbnail (Frame 37353) Video thumbnail (Frame 38512) Video thumbnail (Frame 39781) Video thumbnail (Frame 41780) Video thumbnail (Frame 42975) Video thumbnail (Frame 45508) Video thumbnail (Frame 48887) Video thumbnail (Frame 51851) Video thumbnail (Frame 54009) Video thumbnail (Frame 55462) Video thumbnail (Frame 56793) Video thumbnail (Frame 58035) Video thumbnail (Frame 59947) Video thumbnail (Frame 64784) Video thumbnail (Frame 66288) Video thumbnail (Frame 67658) Video thumbnail (Frame 69015) Video thumbnail (Frame 70321) Video thumbnail (Frame 72451) Video thumbnail (Frame 73814) Video thumbnail (Frame 74970) Video thumbnail (Frame 76190) Video thumbnail (Frame 77678) Video thumbnail (Frame 79056) Video thumbnail (Frame 81299) Video thumbnail (Frame 83390) Video thumbnail (Frame 84735)
Video in TIB AV-Portal: Deep Learning with Python & TensorFlow

Formal Metadata

Deep Learning with Python & TensorFlow
Title of Series
Part Number
Number of Parts
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Release Date

Content Metadata

Subject Area
Ian Lewis - Deep Learning with Python & TensorFlow Python has lots of scientific, data analysis, and machine learning libraries. But there are many problems when starting out on a machine learning project. Which library do you use? How do they compare to each other? How can you use a model that has been trained in your production app? In this talk I will discuss how you can use TensorFlow to create Deep Learning applications. I will discuss how it compares to other Python machine learning libraries, and how to deploy into production. ----- Python has lots of scientific, data analysis, and machine learning libraries. But there are many problems when starting out on a machine learning project. Which library do you use? How do they compare to each other? How can you use a model that has been trained in your production application? TensorFlow is a new Open-Source framework created at Google for building Deep Learning applications. Tensorflow allows you to construct easy to understand data flow graphs in Python which form a mathematical and logical pipeline. Creating data flow graphs allow easier visualization of complicated algorithms as well as running the training operations over multiple hardware GPUs in parallel. In this talk I will discuss how you can use TensorFlow to create Deep Learning applications. I will discuss how it compares to other Python machine learning libraries like Theano or Chainer. Finally, I will discuss how trained TensorFlow models could be deployed into a production system using TensorFlow Serve.
Slide rule Invariant (mathematics) Computer animation Software developer Googol Software developer Sheaf (mathematics) Computing platform Right angle Cloud computing Rule of inference Point cloud
Building Open source Artificial neural network Source code Projective plane Virtual machine Bit 3 (number) Mereology Twitter Type theory Latent heat Machine learning Computer animation Software Average Term (mathematics) Energy level Computer engineering Quicksort
Pixel Functional (mathematics) Link (knot theory) Function (mathematics) Black box Line (geometry) Connected space Numeral (linguistics) Computer animation Software Operator (mathematics) output Representation (politics) Quicksort output Computer-assisted translation Pixel
Classical physics Computer animation Linear regression Artificial neural network Scalar field Point (geometry) Artificial neural network Virtual machine Bit Function (mathematics) Social class
Presentation of a group Multiplication sign Artificial neural network Spiral 1 (number) Function (mathematics) Inverse element Theory Wave packet Medical imaging Sign (mathematics) Negative number Selectivity (electronic) Data structure Position operator Addition Multiplication Demo (music) Linear regression Artificial neural network Weight Point (geometry) Bit Multilateration Line (geometry) Connected space Type theory Category of being Word Befehlsprozessor Computer animation Software Order (biology) Linearization Triangle output Right angle Musical ensemble Resultant
Multiplication Functional (mathematics) Artificial neural network Point (geometry) Artificial neural network Web browser 10 (number) Tensor Type theory Process (computing) Fluid Computer animation Software Operator (mathematics) Matrix (mathematics) Binary multiplier
Point (geometry) Predictability View (database) Multiplication sign Virtual machine 10 (number) Tensor Type theory Computer animation Vector space Order (biology) Matrix (mathematics) Quicksort
Programming language Addition Multiplication Divisor Two-dimensional space Dimensional analysis Power (physics) Number Connected space Revision control Tensor Type theory Computer animation Vector space Software Operator (mathematics) Vector space Matrix (mathematics) Quicksort Matrix (mathematics) Spacetime Euklidischer Raum
Addition Multiplication Weight Multiplication sign Maxima and minima Tensor Word Computer animation Vector space Software Operator (mathematics) Matrix (mathematics) output Resultant Form (programming)
Predictability Functional (mathematics) Coefficient of determination Computer animation Software Maxima and minima Energy level Function (mathematics) Quicksort Computer-assisted translation Mereology Form (programming)
Predictability Motion capture Online help Function (mathematics) Mereology Wave packet Medical imaging Computer animation Software output Representation (politics) Right angle Software testing Quicksort Endliche Modelltheorie Summierbarkeit
Functional (mathematics) Direction (geometry) 1 (number) Insertion loss Function (mathematics) Coefficient of determination Computer animation Software Different (Kate Ryan album) Software testing Figurate number Computer-assisted translation Resultant
Functional (mathematics) Theory of relativity Artificial neural network Weight Direction (geometry) Multiplication sign Insertion loss Bit Online help Theory Number Wave packet Medical imaging Machine learning Computer animation Software Bit rate Different (Kate Ryan album) Term (mathematics) Entropie <Informationstheorie> Error message Resultant Mathematical optimization Gradient descent
Machine learning Computer animation Software Artificial neural network Energy level Bit
Multiplication Real number Flow separation Tensor Data model Medical imaging Computer animation Software Vector space Different (Kate Ryan album) Operator (mathematics) Matrix (mathematics) Endliche Modelltheorie Quicksort
Predictability Multiplication Mathematical model Wave packet Supercomputer Supercomputer Wave packet Tensor Medical imaging Computer animation Software Befehlsprozessor Operator (mathematics) Single-precision floating-point format Order (biology) Matrix (mathematics) Right angle Matrix (mathematics) Resultant
Mainframe computer Computer file Projective plane Virtual machine Directory service Cartesian coordinate system Formal language Wave packet Supercomputer Number Medical imaging Digital photography Computer animation Googol Google Street View Arrow of time Endliche Modelltheorie Descriptive statistics
Dataflow Open source Graph (mathematics) Virtual machine Food energy Number Wave packet Tensor Googol Befehlsprozessor Vertex (graph theory) Library (computing) Scale (map) Automatic differentiation Dataflow Run time (program lifecycle phase) Projective plane Constructor (object-oriented programming) Open source Computer network Scalability Thread (computing) Wave packet Virtual machine Type theory Array data structure Computer animation Software Integrated development environment System programming Software framework Queue (abstract data type) Library (computing)
Logical constant Dataflow Graph (mathematics) Mathematical model Wave packet Local Group Mathematics Tensor Military operation Object (grammar) Operator (mathematics) Core dump Integrated development environment Data structure Tunis Operations research Logical constant Graph (mathematics) Dataflow Weight Core dump Variable (mathematics) 10 (number) Connected space Type theory Computer animation Software Free variables and bound variables output Data structure
Convolution State of matter Logarithm Graph (mathematics) Mereology Variable (mathematics) Mathematics Exclusive or Tensor Object (grammar) Military operation Matrix (mathematics) Area Logical constant Building Electronic mailing list Inverse element Control flow 10 (number) Tensor Queue (abstract data type) Quicksort Block (periodic table) Abelian category Data structure Dataflow Division (mathematics) Maxima and minima Wave packet Differenz <Mathematik> Sigmoid function Operator (mathematics) Ranking Computer multitasking Integrated development environment Operations research Addition Multiplication Dataflow Interface (computing) Core dump Division (mathematics) Exponential function Shape (magazine) Mathematics Computer animation Synchronization Matrix (mathematics) Computer-assisted translation Library (computing)
Laptop Metropolitan area network Dataflow Multiplication sign Virtual machine Maxima and minima Ext functor Electronic mailing list Infinity Mathematical model Number 10 (number) Wave packet Inclusion map Medical imaging Type theory Kernel (computing) Computer animation Function (mathematics) Species Gamma function
Medical imaging Array data structure Pixel Computer animation Range (statistics) Execution unit Linearization Shape (magazine) Shape (magazine)
Web page Plotter Bit Line (geometry) Shape (magazine) Infinity Graph coloring Food energy Shape (magazine) Number Wave packet Particle system Medical imaging Uniform resource locator Computer animation Vector space Personal digital assistant output Right angle Simulation Conditional-access module Exception handling
Area Metropolitan area network Dataflow Pixel Artificial neural network Weight Logistic distribution 1 (number) Variable (mathematics) Connected space Wave packet Number 10 (number) Medical imaging Computer animation Software Personal digital assistant Single-precision floating-point format Free variables and bound variables output Nichtlineares Gleichungssystem
Dataflow Functional (mathematics) Direction (geometry) Maxima and minima Insertion loss Function (mathematics) Arm Wave packet Number Mathematics Matrix (mathematics) Entropie <Informationstheorie> Representation (politics) Endliche Modelltheorie Gamma function Summierbarkeit Conditional-access module Multiplication Trail Artificial neural network Weight Variable (mathematics) Entire function Computer animation output Free variables and bound variables Nichtlineares Gleichungssystem Personal area network Gradient descent
Functional (mathematics) Multiplication sign Direction (geometry) Bit Insertion loss Total S.A. Maxima and minima Variable (mathematics) Subset Wave packet Computer animation Error message Gradient descent
Multiplication sign Archaeological field survey Sampling (statistics) Entire function Variable (mathematics) Wave packet Subset Goodness of fit Computer animation Personal digital assistant Internet forum Set (mathematics) Software testing Gamma function Installable File System
Metropolitan area network Dataflow Mapping Artificial neural network View (database) Maxima and minima Letterpress printing Mereology Hand fan Variable (mathematics) Neuroinformatik Wave packet 10 (number) CAN bus Word Computer animation Personal digital assistant Operator (mathematics) Software testing
Weight Virtual machine Maxima and minima Sound effect Bit Mereology Tetraeder Wave packet Emulation Wave packet Medical imaging Tensor Computer animation Software Whiteboard Core dump Moving average Self-organization Personal area network Conditional-access module
Convolution Pixel Randomization Building Maxima and minima Function (mathematics) Mereology Number Medical imaging Finite element method Moving average Representation (politics) Conditional-access module Information security Electronic data interchange View (database) Weight Bit Maxima and minima Flow separation Convolution Tensor Uniform resource locator Kernel (computing) Computer animation Software Personal digital assistant output Summierbarkeit Quicksort Spacetime Wide area network
Empennage Gradient Function (mathematics) Mereology Regular graph Wave packet Variable (mathematics) Wave packet Computer animation Software Network topology Endliche Modelltheorie Mathematical optimization Gradient descent
Stapeldatei Computer animation Software Personal digital assistant Multiplication sign Order (biology) Maxima and minima Iteration Right angle Function (mathematics) Special unitary group Wave packet
Histogram Metropolitan area network Dataflow Functional (mathematics) Computer file Graph (mathematics) Graph (mathematics) Core dump Insertion loss Function (mathematics) Plastikkarte Cartesian coordinate system Wave packet Tensor Computer animation Whiteboard Representation (politics) Software testing Whiteboard
Functional (mathematics) Artificial neural network Correspondence (mathematics) Closed set Insertion loss Inverse element Grass (card game) Variable (mathematics) Wave packet Revision control Computer animation Personal digital assistant Entropie <Informationstheorie>
Building Functional (mathematics) Graph (mathematics) Weight Zoom lens Maxima and minima Insertion loss Function (mathematics) Machine code Mereology Login Medical imaging Word Computer animation Software Personal digital assistant Order (biology) output Representation (politics) Object (grammar) Endliche Modelltheorie
Dataflow Tensor Dataflow Computer animation Artificial neural network Different (Kate Ryan album) Line (geometry) Wave packet 10 (number)
Dataflow Multiplication Sine Parallel computing Virtual machine Parallel port Mereology Mathematical model Virtual machine Mathematical model Data model Computer animation Software Different (Kate Ryan album) Computer hardware Endliche Modelltheorie Summierbarkeit Library (computing)
Complete graph Slide rule Graph (mathematics) Simultaneous localization and mapping Virtual machine Maxima and minima Ordinary differential equation Bit Special unitary group Storage area network Emulation Number Data model Heegaard splitting Tensor Hash function Physical law Endliche Modelltheorie Local ring Geometric quantization Window Dataflow Sine Parallel computing Server (computing) Water vapor Parallel port Subgraph Wave packet Replication (computing) Mathematical model Mathematics Computer animation Scheduling (computing) Mathematical optimization Row (database)
Dataflow Wechselseitige Information Server (computing) Graph (mathematics) Simultaneous localization and mapping Cloud computing Bit Number Wave packet Independence (probability theory) Revision control Data model Tensor Term (mathematics) Different (Kate Ryan album) Synchronization Operator (mathematics) Process (computing) Endliche Modelltheorie Local ring Geometric quantization Maß <Mathematik> Window Dataflow Sine Parallel computing Server (computing) Weight Water vapor Parallel port Wave packet Replication (computing) Mathematical model Mathematics Type theory Computer animation Scheduling (computing) Mathematical optimization Task (computing)
Noise (electronics) Information management Server (computing) Service (economics) Parallel computing Parameter (computer programming) Mathematical model Wave packet Emulation Wave packet Heegaard splitting Computer animation Intrusion detection system Synchronization Iteration Endliche Modelltheorie
Trail Dataflow Multiplication sign Expert system Event horizon Twitter Wave packet Tensor Machine learning Computer animation Software Network topology Order (biology) Hill differential equation Software testing Figurate number
Server (computing) Scaling (geometry) Moment (mathematics) Virtual machine Parameter (computer programming) Stack (abstract data type) Connected space Number Wave packet Word Computer animation Software Telecommunication Order (biology)
Linear regression Graph (mathematics) Disintegration Execution unit Bit Tensor Sic Googol Operator (mathematics) Computer hardware Process (computing) Geometric quantization Point cloud Machine learning Execution unit Dataflow Forcing (mathematics) Planning Coma Berenices 10 (number) Tensor Digital photography Computer animation Prediction Order (biology)
Dataflow Multiplication sign 1 (number) Virtual machine Parameter (computer programming) Mereology Mathematical model 2 (number) Product (business) Goodness of fit Tensor Different (Kate Ryan album) Term (mathematics) Software Energy level Source code Multiplication Algorithm Graph (mathematics) Dataflow Artificial neural network Block (periodic table) Lemma (mathematics) Moment (mathematics) Virtual machine 10 (number) Type theory Word Computer animation Visualization (computer graphics) Logic Website Right angle Quicksort
with the so this is the other 5 unintelligible talk I think you very much for coming to this sections of right after lunch and I know of you many of you are going to start getting sleepy right around the 40 In March 30 minute mark so do your best to stay awake all do my best to keep you awake so what's going to work together to get to the talk on so if the slides will invariants mind just to introduce myself my name is the lowest common I'm developer advocates that that at Google I worked
on the cool cloud platform genes so that kind of encompasses all rule cloud platform so few people or not you people but if
you you guys are familiar with some things like average in the work of a computer engineer that sort of thing the that's what we would call part from his and so on and on and on on Twitter and union Louis I I've been tweeting like source throughout the compensation deal finally fairly easily on on Twitter on and just a little bit more of a background about myself I based in Tokyo Japan was listed on for about 10 years and then I think it kind of
active in the the present community there as well so far and 1 of the the for people who kind of founded the
department at the conference which is about a 600 person conference I just to give you an idea of the size and going to be having the conference in September in the 3rd week of september I believe it's from the 20th to the 24th going on and if you look at my country the and you can find out register I think there are as of now something like 20 slots left so well so hurry up and also of prettier for young enthusiastic about other kind of communities so he could go community as well as the other open-source kind of projects communicable like like companies and and Dr. in this type of containerization and things so that's that type of thing that you cannot expect to hear from you if you follow on Twitter on so 1st of just is a kind of a background they wanted to go over a kind what deep learning is and I'm going to give a very high level not known as a high level but they sort of a quick overview of what there is like how many of you guys that went to the talks earlier than they know about the people learning to quite a few of you on Monday trying my best to to kind of build on that 1 but there may be a little bit of overlap so what are we talking about when we talk about it learning talking in terms of deeper learning we're talking about a specific type of machine learning which is using neural networks and neural networks or a way of doing machine learning where you build this kind of network of nodes of interconnected by so you
essentially give something like this this cat picture of you change the pixels into a a kind of numerical representation you pass that through as far as the input layer into the good networks and each of these nodes internal nodes will take the the internal the values from your inputs and some operation on them and eventually give you the the output
so these are typically organized in layers so you can see this blue 1 is the is the input
layer and the one-to-one there is a is what's called a hidden layer of so if you think of a there is no network is kind of a black box and the hidden layers the layers that are actually inside do the operation and so each 1 of these little nodes this is that and there's some sort of operation on the on the inputs and that's called basically an activation function on and then each of these are kind of links together using a weighted connections so each of these little lines connecting the the layers will be weighted to indicate latest strength between each of the layer so
what are the little neural networks before so no essentially good for a kind of classification and regression problems so these are very wide class of problems that that you can apply machine learning to on so classication is basically putting things into buckets so you can have
like a bunch of pre-defined but it's like a B C and then you get something quite say which but it doesn't go in and then you basically put to the network and get a probability that goes in a B or C and regression is a little bit more somewhat more complicated in that you get instead of a probability that goes into a single bucket like this you know between 0 and 1 you get kind of a a scalar outputs so say you have or you are you get some values into neural network and outputs you wanna say like is the temperature so like from from you know say 0 kelvin to some value or some some temperature that's more like a scalar or that would be used could be solved by regression of talking mostly about about the classification problems but regression is also something that neural networks to prove that so what is was that actually look like that so here's a
a a little demo that available what playground that content flowed out word this is like a little demo that allows you to kind of look into a neural network and can get an idea of what's going on so here we have some input features so these are some values that you add the input into the network and then you have some hidden layers in the middle and then you get some sort of and so if you went to some of the earlier presentation this or something similar to this we have saved a on mines in London like say that you have like the weight and height of a person and then you have 2 different categories here like these ones ones are needy children and these blue ones or say adults so if you want to classify new pieces of data coming into here you never you could say save you training network to do this but this is really you know very easy to do like this is essentially a linear classification problem where a linear regression and classification problem where you can just draw a line in between the 2 ends gets a way of predicting between to predicting between it and let's say you have something bit more complicated let's look at 1 where the data 1 category is completely encircled by the others so if we were to do something like this and then tried to train using just some you know X x and y inputs this this we actually never basically never converges never figure out how to do this problem so we can do things like that as the the input layer and that will essentially do the the this kind of linear classification multiple times as they can say let's do this 1 time I will see that it basically creating 1 line here so everything on this side of it all classifiers lines in the side of class-wise blue but then we add like new layers with not layers but new nodes we can actually see that it it gets a little bit more complicated so we could say it now figures out to lines for users to lines and then average the results together you see in the 1 node done 1 kind of linear regression and 1 noticed another and then when you combine together it kind of makes this band here and what's not just 1 that 1 now if we do with 3 we can actually combine the results 3 times and we get kind of a triangular Toeplitz structure select as we add this kind of nodes and and layers we can do things that are more and more complicated OK so that's great so late but how do we classify something that looks more like this so this is kind of a spiral looking training in these the spiral blue and this policy 1 inch this is something that's only quite a bit more difficult and we can't necessarily classified if using something weight just X and Y and so you can imagine the sign will be good or x squared but even still like we don't really get very very good output I just by having a very shallow kind of network with just 3 nodes so in order to actually make this a little bit more complicated or there to these more complicated problems so we need a much more complex than that so for something like this this may or may not actually converge on but it's getting there so so right now this is actually not terribly stable but it will stabilize like so once you if you have this kind of more complicated networks that you can kind of put together you can actually start solving more more complex problems and so all talk a little bit about why that's important a little bit later but you can see that each image of these individual nodes like has their own kind of In addition to this to the to the final outputs and then each of these little lines here we that show the weights for these blue ones are positive news 1 2 and the negative such that show was that the negative ones are actually inverse inverse relationship so With this inverse relationship you can essentially just you know would firstly the wanted a blue the theory get the right type of outputs but on the center you can have this kind of positive and negative weighted connections between the different nodes want to turn this off so it doesn't burn my CPU and then that year
of the way that that was basically just a way of getting to
understand no networks start actually using tens of fluid under the hood it's like all the new jobs the browser but from it's essentially a way of like kind of getting familiar with more familiar with the real networks so is a neural networks so neural network is essentially that when you break down is some essentially a pipeline of basically taking something like a matrix what is essentially called attention and putting it through like this pipeline of operations and so you can imagine that each of these is a like a matrix multiplication type of
problem over think of a function where you take of 1 matrix multiplied by another matrix multiply vinyl matrix and another and another and another and another and then eventually you get out of the tensor that represents the output and for a particular for your particular problem and this is basically very loosely modeled after house how the brain works
only in how the of the the individual nodes like have have with a kind of strength the weight that in between the neurons in your in your brain have a certain weight between but from the from a practical point of view you're essentially doing matrix operations on on and a bunch of times in order to do some sort of predictions assignments and what a tensor on and this is where the this is what were tens of all gets its name from but a tensor is not
something that people necessarily think of very often heard on encountering too often on unless you're a machine learning type of person but on most people are familiar with things like vectors and matrices and tensors essentially a generalized
version of that so you can imagine like this kind of like to be like
Euclidean space our 3 D space and then you have some sort of value out here in the space and so for something like a factor you know you would have like you know it to the vector hand that could be saved represented by a a single array in a programming language for a matrix which is a you know a 3 dimensional vector a two-dimensional vector on but attention is essentially a generalized version where you have this n-dimensional are type of active so you could have like
any number of dimensions of of dimensions from so this could be 1 dimension for each type of features that you're actually add into the into the into the network and you can essentially do the same sort of operations on the tensor as you would find a say a matrix so like matrix multiplication or matrix addition that's so hold the power of the signal network works is that you would have this kind of connected nodes so this is our
input vector input tensor with X 1 X 2 X 3 and then we have the weights on cancer word that is represented in that you can then multiplying and then finally we add the resultant result of biases in the form of a tensor and then softmax it's to get the out this is a very very basic 1 uh 1 layer network but you you can think of these individual not these individual on weights or whatever or not each matrix multiplications but essentially this this matrix times this matrix is this kind of interconnected makes this kind of interconnected time and so this is how likely the if you want to use only Audio talks most of the the operations are performed in this way where you have uh the input X times W which is the weights of plus the which is the biases and then you can do that multiple times for a for each layer and and so these are
basically just multiplications and additions and then we have
this kind of softmax thing at the end of the softmax is essentially just the form of a in way of normalizing the data you want to consult seals typically the these at the very end of the networks on so what happens is after you've gone through this network these outputs what it would be at this level is that you would have some sort of value like 50 this was like 50 isn't like 20 someplace . 3 2 you know and like so you don't really get an idea of like what that actually means that's kind of these values are kind of a relative value for your actual network so when you put this through the softmax function this is largely normalized it to a value between 1 and 0 I see
if you get essentially a prediction output so and then these these individual values would represent about whether the the percentage say of that a particular value goes into a particular part a this is a cat dog and this is the human
and we put in an image value the output might be like that it's . 9 9 % certain it's a cat and . 0 1 % that the dog in . 0 0 1 per cent are certain that to human which essentially means that the
so that's great so that's knowledge we
actually like prediction so let me say input inputs we go through all these operations and get some sort of predictive right but how we
actually train on model so model is trained in in this way where you have you use a method called back representation of which was part of some of the earlier talks on but essentially what you have is you have this this year is the neural network as we've been talking about before so here's like said 1 layer appears in the 2nd layer uses softmax and here's our help with 1 and we actually go through here and we do the prediction on but then what we do is we use we use test data to actually as we put it through our our network so we have some test it says here's the here's the actual data user the capture of this is a catch so you have the actual value of of the actual
output expected outputs and the actual on the test data of associated with each other so you know which ones are caricatures which when the dog pictures and and so what we do is
to put this say that capital through here and then it comes out with it is with the result and what we do is we take that result and we and the expected value and then we find essentially the of the difference between those 2 value so safe if it came out that it was you know . 86
% certain that was cat but we know that it's a hundred per cent certain it's a we want to be able to knowledge of our network in the direction of actually determining with 100 and accuracy there's a capsule figure this and use what's called a loss function to
find the differences so selected typical loss function might the cross entropy but there are a number of other loss functions that you can use depending on you the situation and then you go through and these are the new kind of optimizing results by using something like gradient descent is were also talked about a little bit earlier on multiple innovation more in general about these kind of optimization or is especially gradient descent on but essentially what you do is you put through this this optimization of function and then back propagate all the values into the weights and biases for each individual layer so these these this way 1 way to bias 1 way to invite to are actually the weights and biases from their use in the network here and so what you're doing here is your essentially backpropagating of these values and updating the weights and biases well and so on kind of knowledge in the network in the direction of actually doing doing is giving you the
proper help and then you do this that essentially many many many times you training it over and over and over and over again and it is eventually nodes is in the direction of of the a very but actually at least that's the that's the theory so this doesn't always work like that but in general that's the idea behind such kind of like of relative overview of how the what neural networks are like so why are we actually talking about this so you know 1 of the earlier talks mentions and things like that so these like image next is a you know a famous open datasets also offered for machine learning and we get like say like 25 per cent error rate in like 2002 on but so essentially the reason why we're talking about the standard did not all sudden is because of people have started to get very much better at the end trading these neural networks on and this is because of a number of kind of breakthroughs in terms of training these networks to do things that are actually practically useful so you can think of the the the quality of
the of the neural network kind of like this so I kind of traditional deeper traditional learning algorithms we kind of as you get more data would kind of increase in performance kind of level off very quickly and then you would have like smaller small they're elaborates which also kind of level off quickly and so essentially what people did was they would you know training the amount of data to about here for a given of certain amount data to right here and then they would basically yeah they wouldn't be adding more data would
actually make it much better so they will essentially be able to stop right here but we have since found these kind
of these you know network methods that allow us to scale the the learning much better so as we throw more data and the problem they actually get quite a bit more sophisticated and have quite a bit better performance so we've been able to create these large deep neural
networks there will continually get better as we give it more and more data and with that comes like other problems which I'll talk about in a 2nd but essentially these these light medium and large real networks have become possible recently and so
here is a model of what's the this is a global the grown on the network on that was used in this is called this is essential in such a model that was trained on the woman and so what this is essentially doing it's like labeling vectors labeling images you can think it is if 1 of these has been say a matrix multiplication or some sort of operation on a matrix and it goes through several different kind of layers and then eventually gives you are how tensor that tells you the labels so this is what we mean when we talk about deep now networks of networks that are essentially have like many many many many
layers before they actually give you this output and by adding these layers we actually can start getting more and more complicated you solving more more complicated problems and actually getting a pretty good results about with but this this is a problem where we have you know you can imagine that each 1 of these is a matrix multiplication and these cancers might be you know a large image like a megabyte or something and you're changing that in the tensor and then doing a matrix multiplication on it you can imagine how many actual new operations you have to do In order to actually train this work to do even prediction even just once so you have to do this many many many many times over in actually trying networks and so what people do is these GPU is on and these 2 views of
very good at high priority but still you're essentially waiting for like 2 weeks or sometimes even months for the results of actually 1 1 single training right so what people started in the late eighties like supercomputers In order to train models faster on but still this is a problem because not everybody has access to supercomputer how many of you guys have access to supercomputer somebody does that's that's most I've ever seen those like 3 or 4 things on so how much the patient
had away so there is this arrow supercomputers there or something that you have the least time on little like the older mainframes of all the you know where you had the likely sometimes you know 738 and you know in the middle of the nite or something like that and you take tons of money for them so they're not exactly the easiest the best way and we want like you know the ideal thing is to deal for everybody to be a little machine learning so what you need is a kind of distributed kind of training on and so they actually been able to do that and so we use it for a lot of practical applications
things like and photos and like detecting text in street view images on so there's a lot of kind of exciting things that are going on and essentially recently we've these kind of refusal on quite a lot of activity and Google as this is a number of other projects the internally will then used by that use learning machine learning this is just the number of directories that contain a model description file competency from 2004 we got this kind of practice the growth and
yellow by distributing it we've been able to do you know much much faster except book so now I'd
like to talk about and floor itself and so flow is an
open-source library it's a generic very general purpose machine learning library for particularly for doing all networks we're also and expanding its too encompass other
types of machine learning on but said it was open source environment has applied on energy is buying internally and will for a lot of our internal projects so it supports a number of things like you know are you know this kind of
flexible and intuitive construction no to basically be able to do a lot of things or in an automated way on and you can see it supports training on things like CPU and GPU successor on 1 of the nice things that you define this kind of networks in Python so far before I kind
of to looking at what intervals like some of the core concepts you have a graph so tender flows the tuning of tens flow comes from the idea of like taking chances and having them close to a flow graph of the directed flows had the graph
so it's a graph is the representation of that these are the operations of the actual nodes of the the operations that you do
I intensity the data that actually passed through the through the on the network and then we have other types of kind of structure so we have these like the idea of these constants which can be something that doesn't change on
but then you have things like placeholders these are basically inputs into our into our network these these variables some variables of things that actually change during the training so these are the things that you usually use for your weights and biases etc. and session is something that actually encapsulates the on the overall make connection between tens of score and how you actually the models that you find so I should mention that tend to flow
is a library that is based on the same sort of concepts of many other libraries
consigned to libraries we have a part of Python interface EPI and then it has a kind of a C + + horror that enables you to do these kind of very fast operation so we are actually during training you don't you're not actually going through the pipe and the so these are you know non-exclusive
area non exclusive list of all the operations you can do with tens of flows of things like math addition subtraction multiplication division of of these tensors matrix operations stateful kind of operations experiments so
let's actually what this looks
like so when you run through so this
is this is a stupid notebook so how many people have heard of Jupiter
use Jupiter 0 OK how many people have been asked that more than 5 times this conference on river on so is just assume that you guys even just kind of go from there let me actually just we
start this kernel here yeah yes this is a python 2 1 because tens of flow also supports Python 3 if I remember right but this particular example Python T and so on of flow is is pretty easy to get started there's like this is this is just using the as a kind of amnesty examples of what the so Mr. mission machine was talking about the began this example earlier today raise essentially a bunch of images that are kind kind of handwritten numbers and you dispose the artist to determine which type of which number is actually present in the image so the training images look something like this we have 55 thousand images and they're all in this species long a ready of and each 1 has 784 pixels and they're basically models monochrome selective just
black and of but the the and
so if you look at the shape of that unit the 55 thousand the size range on with the 784 pixels but if you look at the face of the images they're essentially each value in it is the this is each 1 of the images in Europe linear kind of a two-dimensional array and each of these values is a value from 0 to 1 of those essentially how . that particular pixel is so some of these like . 2 3 which is kind of a white bread all the way up to 1 so that's
essentially what the data looks like so
that's how we've actually represented here like if you had a color energy we need to represent a little bit differently but that's essentially how we're doing it in this case and then this is just using this is just showing an example value so using that that plot line so this is just 1 of the input images so that's essentially with the training data looks like but then we have these training labels that are associated with each image that says that's basically 10 you know or a or a vector of size 10 with you know a bunch of zeros in it and a 1 in the right location that's that indicates the number of the for that particular image so for this image we have a page here so if we look at it between labels the shape of that that's 10 10 size and then if we look at this the particle 1 for this case we can see that the 1 is in the of what is this is like 0 to 9 or something in this particular Thompson each column I think this is like this 0 free is actually pretty sets from 0 to 9 so that's that's essentially what this is these actually 100 actors in where you have 0 in all the values except for 1 this is used often in training data but the data that should get out of it is actually
similar today's this except for it will be a bunch of values from 0 to 1 essentially a probability and so here's some of
the images by itself as some images of their connection earlier on but so wanted training and you can kind of get these so you can train it to show these different of set these weights and biases so that make individual pixels like will indicating whether it's a the particular number so in this case we're actually using a very simple no
network which will kind of which
with just 1 layer which will work this way but essentially give you see pixels in these blue areas that probably is 0 and if it's in the if you if there's any pixels in here in the red area that it's probably not as 0 and then it's like basically aggregates the the probabilities whether this is 0 and you can kind of see that in in many the other ones so like this one-to-one you see pixels in this area with it to you it's like in this area and 3 in this area so they look similar to the actual value the numbers you looking for so I actually don't think so the next 1 and this is actually as defining our of our networks so here were defined importing tens of flow reason the placeholder that I talked about earlier this is our input into the In into the neural network and we had it is the size of 784 so this is the size of the number of pixels and then we have these weights and biases as variables which can be updated as we train the model on in here is actually where we define our of our network so here is this is just the same single layer network for doing you can define its very similarly in Python to the way that you would do in
mathematics so here we can say were doing a matrix
multiplication on the inputs times the weight and adding it to the bias of variable and then doing a softmax on it at the end and then kind the flow internally will take these and build our kind of of
our like data flow diagrams data flow you can model representation so once we have that out we can then use that to you today in the train model
so this is actually our our our neural network and then we have a placeholder for the outputs this why prime and we've defined a cross entropy function here is our loss function and then we can basically plot of pretty into this gradient descent optimize and optimizing using cross entropy and this will then create our kind of training step so this isn't this encompasses the entire of the entire no number plus the training that we need to do and as grid like like the some of the other explanations from the talks gradient descent is essentially a way of kind of nudging our neural network in the direction that we wanted to so and I think 1 of the the talks I talked about using going down and not I'm using a single little you flash later a
torch and then kind of just going a little bit at a time I don't know of essentially that's the idea you're essentially going down and moving it in the
direction toward Towards a in minimum to actually minimize the loss so that the same that this altitude would be the the error the loss generated by the loss function and then we basically noted in the direction where the lost due to minimize and then essentially do that just over and over and over again so each 1 of these of the training epoch as we're going down the drain descent optimizes so here we're going to but we're going to train a thousand times on a particular piece of data and so what's great is that we can also do this kind of mini-batch training which is on way of you basically take just a small subset of the total
training data so we're not actually training everything'll time on every single piece of the training data we're actually training on a random randomly selected facts of this case 100 I think over here on elements and we're doing that essentially a thousand times so this is the thing we longer than usual on and so that what's good
about that is that you have to train on the entire training data you can essentially do something that's you basically take a randomized in a subset of the training data and that's essentially the same thing you do
when you do like say a statistical survey we asked a 1 2 people so you basically get a is this significantly are the a a representative sample of the data OK so this is actually done
OK so the answer this I've actually gone through the training and then at the end we can we can actually check the accuracy of our neural networks so this case actually got about 90 per cent which is pretty bad but about so this is a very simple like 1 layer neural network so that's essentially kind of how you can if you use tens of flow you can basically create these the steps to to run through it but all these steps are actually word of the actual computation is done under the hood in part of this if you're in the year of the c plus plus score and that's and have also maps on to it not saturated devices so if you have to views CP is available watching
map the operations to those particular devices so in this case I'm writing this I might say like I think that
32 core machine so actually map that so I'll talk a little bit about 10 to the 1 liter let me go back can I go back this effect on the back this is not what it's not the back the
only thing OK so here I'm going to
look at a little bit more a little bit more complicated example and where if you get a little bit better accuracy and so forth during summer training really using the same exactly that we did before on both organ actually used to build what's called a convolutional network and this is this is part of a bit about earlier on and so this is this basically allows you to to look at the image like country in part on and basically pick the specific features from
each part of the image and this helps with things like like say you right the way that I had earlier you know you have the the image and you have like certain if use all pixels in a certain location then that would indicate what number it was about what happens if you write this the the this year or whatever but you actually translated it slightly over a little bit of that would actually change the way that the you know than at that particular network would be very good at figuring out there that I just moved this 0 over a few pixels instead so to stuff like that in this country of convolutional of
looking at it helps a lot but In this case what we're gonna do is we're going to initialize the weights and biases a little bit different I think this 1 is just doing this kind of a I think this was taking like kind of random random weights to begin with on but here's the kind of conclusion part so essentially what we're doing is we're going over the image and were picking of a particular these are the the 1 with the work of kernels guess over the image and then were kind of building this or the value of this other kind of 10 that indicates that has a particular value for each of these here for each of the of the picture kernels over that of the image and then we can actually work on the this is just picking kind of features of each individual part of the image rather than looking at the whole image or the image as a whole and then we can take I think that the same things and use what's called cooling calling is another kind of the method that you use to basically kind of 1 of the most common examples max-pooling where you take the the individual value from a part of the of the tensor and you pick the maximum value this kind of like this you somewhat of a representation of a particular part of the image as well and of for that altogether into into layer and you can do that like traits several layers of like that look like that security we set full Our 1st convolutional layer by building these spaces the building the weights and biases and then building our of layer here and then the 2nd convolutional layer takes the inputs from the the outputs from the previous layer and thus the same basically the same sort of thing and then at the end we created a this is a densely connected layers so as the sum the previous talks developed with the convolutional layer is not the kind of and connected between the values because they're actually using this kind of translated from all over the image but the final output layer is kind of a densely connected layer which allows you to kind of just a few basically the exact same thing that we did in our previous layer what we're just we don't have we don't have the
convolutional part and the
allow us to you get a much better and companies and not really talk about dropped and but and then we have basically the RealPlayer and this is essentially just doing the softmax on the outputs of all of last part of the over last output from the previous layer and then you can kind of tree next to the model this in this particular 1 we're doing this using the same kind of cross-entropy of using Adam optimizer instead of a gradient of the regular gradient descent optimizer and then with those kind of optimization you can kind of get a a much better outputs were a much better performance so here we're actually doing a lot more training on this particular 1 because it's on a deeper network and we can we can train
iterates feels a lot better or previous 1 if we continue to train him 1 more it probably wouldn't get wouldn't get very much better than 90 per cent of in this case we can train it quite a lot more times but in order to improve the accuracy so actually at about
20 thousand times on mini batches of 50 and so they will go through this is actually doing this because it's a this takes about 5 minutes or something but to actually run through all that you see from the output can we get about 99 . 2 % accuracy which is a good deal better than 90 90 % right
so it's a 1 in 10 you know it's like around 1 in a hundred years is is classified incorrectly so you can do things from from very simple networks to more cut much more complex networks so let me go back
to my so 1 of the other things that you
can do because flow has this kind of internal representation knowledge about all the graphs
and everything is working together
is you essentially a lot like right log output files as as you during training and these can then be read by a by an application called Cancer bore bored so we were obviously very unique nice things the know tend to float into board is here but what this is really what's really cool about this is that we see with this is that you can look at the the on things like the accuracy of the values of the loss functions and
this look these kind of grasses that as you were training going over the data to kind of see how
neural network is performing so in this case we're seeing the actual accuracy as were training and so this is 1 of the this is I think on it in or on the this simple version so once we get up to about 90 % we get there pretty quickly but we don't really get very much better as the training data but you can look at like things like the accuracy but you can also look at the actual loss functions this is cross entropy looking at the cross entropy value and that kind of goes down and down and down this should actually be the close the inverse of the and the accuracy but you can also look at many of the other values and this these basically corresponds to the to the variables or the the the
individual parts of your other than the basically the the values that you have so here cross-entropy was an actual object of Python objects that you can use word that was defined and
then you can get this kind of log output data so other things like the maximum in and stuff like that are also part of the per cent would the these early kind of
input images that you can look at but 1 of the other cool things that you can actually look at the graph of the data itself so all of the the the model itself that 2 buildings to here we have a 2 layer so if we have a 2 layer network we can actually just kind of like zoom in and look at the individual pieces the network like the weights and biases and things like that for for individual parts the network and look at things like the drop-down values the the loss function as well so like this this basically gives you from the Python code that we wrote I will give you a full kind of graph representation of the order of the network so that in the case of say something like a very complicated you know the image that thing that I was showing you earlier you know you would see this huge huge graph of 4 and that was generated by
that but this is really cool because it helps you visualize your of your neural network which is you define the let's go back here have about 10 minutes or so left something like yellow will get there so that is the main difference between this between distributed training and surrogate between
tend fall in any of the other line of libraries that are out there is that tens of flow was built from the the from day 1 with distributed training in line so essentially change flows built in such a way that we wanted the to productionised cost you actually do practical work
with our with our with our while with the library with our networks so we want to be able to do to train these faster on and based on the kind of like hardware kind breakthroughs in improvements that we've done in the past we've made in the past we want the older use was those to be able to train models fast attend flow of supports multiple different titles of parallelism some old model
parallelism which is essentially breaking up the model so each 1 of these machines that takes a different part of the model and you basically feeding through through here I and basically break up the work that way but it
also supports what's called data parallelism which is
hopefully and 1 of the slides it
disappeared so data parallelism is the opposite where you basically break up the data instead what each
1 of the of the of the machines has a full copy of the model so you basically splitting up the like record 1 through a hundred and sending it to 1 machine and you know 101 through 200 to different machine and then breaking up the model the values of the of the work that way and there's a number of kind of trade between these whether you
use like you do like a full graph or a subgraph of the model parallelism more synchronous or asynchronous data parallelism this is going to help her like
yeah there's a kind of these pluses and minuses to each of these and
so that's kind of there's no like
similar to this but
I do know that in will that we use premeditated parallels the use data and pretty much exclusively so terms of flow
basically supporting a number of ways this these different types of model parallelism
and and so on they and
sedated roles and is this 1 OK so this is where you take the data you can split up and each 1 of these replicas has a full has the entire model and then once you've done some training you can pass this to the printer server so this is the thing that holds all the way to the biases so these are updated dull in push this back to the model replicas and then there's like kind of asynchronous and synchronous versions of the appeals and where you're updating the model updating the weights and biases in parallel for your operating under synchronously for
each of you kind of iteration on this increases is much
faster but I can kind of added some of some noise here to model because these the parameters account changed midway through can be changed midway through a run whereas in
synchronization you kind of running the split up data and then you wait until all the models have finished a particular as a free going next 1 and reduce it but will actually make that make it a little further make the training of a
bit slower so this is a kind of an example of how that would run with tend to flow we have a bunch of workers doing the paralyzing and then you have some kind of parameter servers and then in between the service they at least yeah PC to to communicate
so why is this kind of data
parallelism important on OK so let's a that instead of extracting a cat out of on
your network we got a dog and relate well we want improve our network we want to make that so Our where do
reflects what we in order to make that actually better I don't know maybe this this is probably a good idea I don't know so we
do that we make tree can be run this again and we're like OK yeah like this is right nice and it's
like running on my GPU and on and it comes out it's like it doesn't make it better like track where right right now like ligand back and start over again so you normally run these kind of experiments like you want all the run these experiments over and over again like very quickly you don't wanna have to wait a week in order figure out that your Tweet went well enough and this is this is a problem with with people who are even experts in machine learning is like you basically you have your have experience and you have a literature that you can you can use to kind of figure out you can narrow down what things you might want to tweak but in the end you need to be able to run back and test to see if the data or if the actual tweet that you may improve the doesn't then improves event essentially have to test and this takes time and so that's why it's very important to be able to do this kind of distributed training but when
the problems is that as you scale the number of nodes like these number of connections the number of connections between your parameters servers in between your workers increases like kind of exponentially and so this doesn't essentially scale you essentially bottleneck on the network
on because these these guys are talking over TCP and you essentially get kind of like you know enough on the order of milliseconds latency between the between the machines so you essentially need to build this kind of like you need to have like a dedicated network or a kind of dedicated harbor network a lot of people use things like if it abandoned or whatever on In order to make this go faster but this is actually something that really the problem at the moment so 1 of the things that we did Google's like we're releasing cardinality internally what we do is we created we do art history training but we have understood a dedicated network that doesn't use TCP IP and basically skips the whole TCP IP stack and is able to to me have the communication between the the machines run on the order of you of MS Word nanoseconds instead of 2 nanoseconds or microseconds instead of instead milliseconds so this is something that we are planning on making the public as of
what's called harden which allow you basically run tens of photographs on the inside of the real data
roles of planning in finally exposing as far as the the idea on dedicated hardware that we use using force on some incentive using these with these these are called tens of processing units I think that what they're calling but essentially what they are at they're they're dedicated hardware used for doing tensor operations so so we're basically being able to like expose those and but to 2 other people so that they can use use that kind of dedicated hardware in order to like kind of do more experience and things like that
so I think that's all I had on so I want to thank you for coming and spending you know the last hour here that how many of you still awake region hands she so
about like 70 % you so let us things a lot for coming out definitely check out the tens of flow . org website there's tons of really good examples
like if you go here and then if you
look here there's like tutorials and documentation this is actually really really good and has like lot of good examples about how to 100 to use Houston's and especially if you're also a person you know there's different ones for different levels of people as well as how to use in terms of clustering to kind of
move towards actually production rising you're your models so thanks a lot for coming and and and few yeah Jewish question that we have like 2 minutes so that works all yes sir question has to develop 15 seconds and then you know maybe 30 seconds Is there anything like profiling for this kind of models like do that have an overview of how many multiplication how many parameters that does each block of the flowgraph needs so usually involve like actually time it took to to run it I don't know which if tens of or gives you that think that it probably should if it doesn't uh I don't know is that the end of that actually is but I think that that could be something that you could visualize sense of word as part of the out you basically logic that is as a value that you can view here and have work and then kind of see that you know how each part of the graph performed things like that other questions . 1 right behind you the so the previous talks today mention that you typically have to do some feature extraction before you can actually apply neural networks and I will censor flow Help me speedup my were manually designed to feature extraction or is it that are designed only to do neural networks stuff so at the moment it's mostly geared towards knowledge I mean obviously did you like feature extraction using a separate neural networks so you could do like neural network that does the retraction and another 1 that does the actual like classification let's there is there is some work going on there's like forget what it's called it's like like tens of flow wide something like that I think it's called itself is essentially instead of like having deep neural networks and the idea is that you have these like kind of more standard type of machine learning algorithms and so I think that there is work going on there too you like incorporate more standard machine learning algorithms you can do that sort of feature extraction beforehand and stuff like that but it's kind of ongoing work you might try and search about Center white I haven't played with it personally so I can't really give you details yes thanks a lot