Deep Learning with Python & TensorFlow
390 views
Formal Metadata
Title 
Deep Learning with Python & TensorFlow

Title of Series  
Part Number 
148

Number of Parts 
169

Author 

License 
CC Attribution  NonCommercial  ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and noncommercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license. 
DOI  
Publisher 
EuroPython

Release Date 
2016

Language 
English

Content Metadata
Subject Area  
Abstract 
Ian Lewis  Deep Learning with Python & TensorFlow Python has lots of scientific, data analysis, and machine learning libraries. But there are many problems when starting out on a machine learning project. Which library do you use? How do they compare to each other? How can you use a model that has been trained in your production app? In this talk I will discuss how you can use TensorFlow to create Deep Learning applications. I will discuss how it compares to other Python machine learning libraries, and how to deploy into production.  Python has lots of scientific, data analysis, and machine learning libraries. But there are many problems when starting out on a machine learning project. Which library do you use? How do they compare to each other? How can you use a model that has been trained in your production application? TensorFlow is a new OpenSource framework created at Google for building Deep Learning applications. Tensorflow allows you to construct easy to understand data flow graphs in Python which form a mathematical and logical pipeline. Creating data flow graphs allow easier visualization of complicated algorithms as well as running the training operations over multiple hardware GPUs in parallel. In this talk I will discuss how you can use TensorFlow to create Deep Learning applications. I will discuss how it compares to other Python machine learning libraries like Theano or Chainer. Finally, I will discuss how trained TensorFlow models could be deployed into a production system using TensorFlow Serve.

00:00
Slide rule
Computer animation
Software developer
Software developer
Sheaf (mathematics)
Computing platform
Point cloud
Right angle
Cloud computing
Invariant (mathematics)
Rule of inference
00:51
Building
Open source
Artificial neural network
Source code
Projective plane
Virtual machine
Bit
3 (number)
Mereology
Twitter
Latent heat
Machine learning
Computer animation
Average
Term (mathematics)
Computer network
Vertex (graph theory)
Energy level
Quicksort
Computer engineering
Data type
03:25
Pixel
Function (mathematics)
Black box
Line (geometry)
Functional (mathematics)
Connected space
Numeral (linguistics)
Computer animation
Linker (computing)
Operator (mathematics)
Computer network
Vertex (graph theory)
output
Representation (politics)
Quicksort
output
Computerassisted translation
Pixel
04:42
Classical physics
Computer animation
Linear regression
Artificial neural network
Scalar field
Point (geometry)
Artificial neural network
Virtual machine
Bit
Function (mathematics)
Social class
06:00
Musical ensemble
Presentation of a group
Multiplication sign
Artificial neural network
1 (number)
Spiral
Inverse element
Function (mathematics)
Weight
Theory
Wave packet
Medical imaging
Sign (mathematics)
Negative number
Central processing unit
Selectivity (electronic)
Data structure
Position operator
Addition
Multiplication
Demo (music)
Linear regression
Artificial neural network
Point (geometry)
Bit
Multilateration
Line (geometry)
Connected space
Category of being
Data mining
Word
Computer animation
Computer network
Order (biology)
Linearization
Vertex (graph theory)
Triangle
output
Right angle
Data type
Resultant
10:50
Multiplication
Artificial neural network
Point (geometry)
Artificial neural network
Web browser
Functional (mathematics)
10 (number)
Tensor
Process (computing)
Fluid
Matrix (mathematics)
Computer animation
Operator (mathematics)
Computer network
Data type
11:56
Point (geometry)
Multiplication sign
View (database)
Virtual machine
Prediction
10 (number)
Tensor
Matrix (mathematics)
Computer animation
Vector space
Order (biology)
Vertex (graph theory)
Quicksort
Data type
12:44
Programming language
Addition
Multiplication
Spacetime
Divisor
Twodimensional space
Power (physics)
Number
Connected space
Revision control
Tensor
Matrix (mathematics)
Computer animation
Vector space
Operator (mathematics)
Hausdorff dimension
Vector space
Computer network
Vertex (graph theory)
Quicksort
Data type
Matrix (mathematics)
Euklidischer Raum
13:56
Addition
Multiplication
Multiplication sign
Weight
Maxima and minima
Tensor
Word
Matrix (mathematics)
Computer animation
Vector space
Operator (mathematics)
Computer network
output
Resultant
Form (programming)
15:13
Hidden surface determination
Computer animation
Computer network
Energy level
Function (mathematics)
Quicksort
Prediction
Computerassisted translation
Mereology
Functional (mathematics)
Form (programming)
Maxima and minima
16:10
Scientific modelling
Motion capture
Online help
Function (mathematics)
Prediction
Mereology
Wave packet
Summation
Medical imaging
Computer animation
Computer network
output
Representation (politics)
Right angle
Software testing
Quicksort
17:21
Hidden surface determination
Direction (geometry)
1 (number)
Insertion loss
Function (mathematics)
Functional (mathematics)
Computer animation
Computer network
Software testing
Figurate number
Computerassisted translation
Subtraction
Resultant
18:13
Entropy
Direction (geometry)
Multiplication sign
Insertion loss
Online help
Weight
Theory
Number
Wave packet
Medical imaging
Machine learning
Bit rate
Term (mathematics)
Subtraction
Error message
Standard deviation
Theory of relativity
Artificial neural network
Bit
Functional (mathematics)
Computer animation
Computer network
Vertex (graph theory)
Mathematical optimization
Resultant
Gradient descent
20:45
Machine learning
Computer animation
Artificial neural network
Computer network
Energy level
Bit
21:45
Multiplication
Real number
Scientific modelling
Flow separation
Tensor
Data model
Medical imaging
Matrix (mathematics)
Computer animation
Vector space
Operator (mathematics)
Computer network
Quicksort
Subtraction
22:55
Multiplication
Prediction
Mathematical model
Wave packet
Supercomputer
Wave packet
Supercomputer
Tensor
Medical imaging
Matrix (mathematics)
Computer animation
Operator (mathematics)
Singleprecision floatingpoint format
Order (biology)
Computer network
Right angle
Central processing unit
Matrix (mathematics)
Resultant
24:19
Mainframe computer
Computer file
Scientific modelling
Projective plane
Virtual machine
Directory service
Cartesian coordinate system
Formal language
Wave packet
Supercomputer
Number
Medical imaging
Digital photography
Computer animation
Google Street View
Arrow of time
Descriptive statistics
25:29
Dataflow
Open source
Graph (mathematics)
Run time (program lifecycle phase)
Virtual machine
Food energy
Number
Wave packet
Tensor
Computer network
Vertex (graph theory)
Library (computing)
Scale (map)
Automatic differentiation
Dataflow
Projective plane
Constructor (objectoriented programming)
Open source
Scalability
Thread (computing)
Wave packet
Virtual machine
Array data structure
Computer animation
Integrated development environment
Computer network
System programming
Software framework
Queue (abstract data type)
Central processing unit
Data type
Library (computing)
26:31
Logical constant
Dataflow
Graph (mathematics)
Weight
Mathematical model
Wave packet
Local Group
Tensor
Mathematics
Military operation
Object (grammar)
Operator (mathematics)
Core dump
Integrated development environment
Data structure
Tunis
Operations research
Logical constant
Graph (mathematics)
Dataflow
Core dump
Variable (mathematics)
10 (number)
Connected space
Computer animation
Computer network
Vertex (graph theory)
Free variables and bound variables
output
Data type
Data structure
27:51
Convolution
State of matter
Logarithm
Graph (mathematics)
Mereology
Variable (mathematics)
Exclusive or
Tensor
Mathematics
Matrix (mathematics)
Object (grammar)
Military operation
Area
Logical constant
Building
Electronic mailing list
Interface (computing)
Inverse element
Control flow
10 (number)
Maxima and minima
Tensor
Queue (abstract data type)
Quicksort
Block (periodic table)
Abelian category
Data structure
Dataflow
Division (mathematics)
Wave packet
Sigmoid function
Operator (mathematics)
Computer multitasking
Ranking
Integrated development environment
Subtraction
Operations research
Addition
Multiplication
Dataflow
Core dump
Division (mathematics)
Exponential function
Shape (magazine)
Computer animation
Mathematics
Synchronization
Matrix (mathematics)
Computerassisted translation
Library (computing)
28:39
Laptop
Metropolitan area network
Dataflow
Multiplication sign
Virtual machine
Ext functor
Electronic mailing list
Infinity
Mathematical model
Wave packet
10 (number)
Number
Maxima and minima
Inclusion map
Medical imaging
Kernel (computing)
Computer animation
Function (mathematics)
Species
Gamma function
Data type
30:20
Medical imaging
Pixel
Array data structure
Computer animation
Range (statistics)
Linearization
Shape (magazine)
Shape (magazine)
Units of measurement
31:06
Web page
Bit
Shape (magazine)
Line (geometry)
Infinity
Food energy
Shape (magazine)
Wave packet
Plot (narrative)
Number
Medical imaging
Particle system
Uniform resource locator
Computer animation
Vector space
Graph coloring
Personal digital assistant
output
Right angle
Simulation
Conditionalaccess module
Exception handling
32:46
Area
Metropolitan area network
Dataflow
Pixel
Artificial neural network
Logistic distribution
1 (number)
Weight
Variable (mathematics)
Wave packet
Connected space
Number
10 (number)
Medical imaging
Computer animation
Personal digital assistant
Singleprecision floatingpoint format
Computer network
Free variables and bound variables
output
Equation
34:34
Dataflow
Entropy
Scientific modelling
Direction (geometry)
Insertion loss
Function (mathematics)
Weight
Wave packet
Number
Summation
Mathematics
Matrix (mathematics)
Representation (politics)
Equation
Gamma function
Conditionalaccess module
Multiplication
Trail
Artificial neural network
Variable (mathematics)
Functional (mathematics)
Entire function
Maxima and minima
Computer animation
output
Free variables and bound variables
Personal area network
Gradient descent
36:00
Multiplication sign
Direction (geometry)
Bit
Insertion loss
Total S.A.
Functional (mathematics)
Variable (mathematics)
Subset
Wave packet
Maxima and minima
Computer animation
Error message
Gradient descent
36:58
Set (mathematics)
Multiplication sign
Archaeological field survey
Sampling (statistics)
Iterated function system
Entire function
Variable (mathematics)
Wave packet
Subset
Internet forum
Goodness of fit
Computer animation
Personal digital assistant
Software testing
Gamma function
37:52
Metropolitan area network
Dataflow
Mapping
Artificial neural network
Computer
View (database)
Letterpress printing
Mereology
Hand fan
Variable (mathematics)
10 (number)
Wave packet
Maxima and minima
CAN bus
Word
Computer animation
Personal digital assistant
Operator (mathematics)
Software testing
38:47
Virtual machine
Sound effect
Bit
Weight
Mereology
Tetraeder
Wave packet
Emulation
Wave packet
Maxima and minima
Medical imaging
Tensor
Computer animation
Whiteboard
Core dump
Computer network
Moving average
Selforganization
Personal area network
Conditionalaccess module
39:58
Convolution
Pixel
Randomization
Building
Function (mathematics)
Mereology
Weight
Number
Medical imaging
Maxima and minima
Finite element method
Moving average
Representation (politics)
Conditionalaccess module
Information security
Electronic data interchange
Spacetime
View (database)
Bit
Convolution
Maxima and minima
Tensor
Flow separation
Uniform resource locator
Summation
Kernel (computing)
Computer animation
Personal digital assistant
Computer network
output
Quicksort
Wide area network
43:11
Empennage
Scientific modelling
Gradient
Function (mathematics)
Mereology
Regular graph
Wave packet
Variable (mathematics)
Wave packet
Computer animation
Network topology
Computer network
Mathematical optimization
Gradient descent
44:12
Batch processing
Computer animation
Personal digital assistant
Multiplication sign
Order (biology)
Computer network
Iteration
Right angle
Function (mathematics)
Wave packet
Maxima and minima
45:06
Histogram
Metropolitan area network
Dataflow
Computer file
Graph (mathematics)
Core dump
Insertion loss
Function (mathematics)
Graph (mathematics)
Cartesian coordinate system
Functional (mathematics)
Wave packet
Tensor
Computer animation
Whiteboard
Smart card
Representation (politics)
Software testing
Whiteboard
46:01
Entropy
Artificial neural network
Closed set
Correspondence (mathematics)
Insertion loss
Inverse element
Grass (card game)
Variable (mathematics)
Functional (mathematics)
Wave packet
Revision control
Computer animation
Personal digital assistant
46:53
Building
Graph (mathematics)
Scientific modelling
Zoom lens
Insertion loss
Function (mathematics)
Machine code
Mereology
Login
Weight
Functional (mathematics)
Maxima and minima
Medical imaging
Word
Computer animation
Personal digital assistant
Computer network
Order (biology)
Representation (politics)
output
Object (grammar)
48:18
Dataflow
Tensor
Dataflow
Computer animation
Artificial neural network
Line (geometry)
Subtraction
Wave packet
Library (computing)
10 (number)
49:13
Dataflow
Multiplication
Sine
Parallel computing
Scientific modelling
Virtual machine
Mereology
Mathematical model
Virtual machine
Mathematical model
Data model
Summation
Computer animation
Computer hardware
Computer network
Subtraction
Library (computing)
49:59
Complete graph
Slide rule
Graph (mathematics)
Scientific modelling
Simultaneous localization and mapping
Virtual machine
Ordinary differential equation
Bit
Storage area network
Emulation
Number
Data model
Heegaard splitting
Tensor
Hash function
Physical law
Local ring
Geometric quantization
Window
Dataflow
Sine
Parallel computing
Server (computing)
Water vapor
Subgraph
Wave packet
Replication (computing)
Mathematical model
Maxima and minima
Computer animation
Mathematics
Scheduling (computing)
Mathematical optimization
Row (database)
50:48
Dataflow
Server (computing)
Graph (mathematics)
Scientific modelling
Simultaneous localization and mapping
Cloud computing
Weight
Bit
Number
Wave packet
Independence (probability theory)
Revision control
Data model
Tensor
Term (mathematics)
Synchronization
Operator (mathematics)
Process (computing)
Local ring
Subtraction
Geometric quantization
Mutual information
Window
Dataflow
Sine
Parallel computing
Server (computing)
Water vapor
Wave packet
Replication (computing)
Mathematical model
Computer animation
Mathematics
Scheduling (computing)
Mathematical optimization
Units of measurement
Data type
Task (computing)
51:47
Server (computing)
Service (economics)
Parallel computing
Scientific modelling
Parameter (computer programming)
Mathematical model
Wave packet
Emulation
Wave packet
Heegaard splitting
Computer animation
Intrusion detection system
Synchronization
Noise
Iteration
52:42
Trail
Dataflow
Multiplication sign
Expert system
Event horizon
Twitter
Wave packet
Tensor
Machine learning
Computer animation
Network topology
Computer network
Order (biology)
Hill differential equation
Software testing
Figurate number
54:12
Server (computing)
Scaling (geometry)
Moment (mathematics)
Virtual machine
Parameter (computer programming)
Stack (abstract data type)
Connected space
Number
Wave packet
Word
Computer animation
Telecommunication
Computer network
Order (biology)
Vertex (graph theory)
55:36
Machine learning
Dataflow
Linear regression
Graph (mathematics)
Forcing (mathematics)
Disintegration
Planning
Point cloud
Coma Berenices
Bit
10 (number)
Tensor
Digital photography
Tensor
Prediction
Sic
Computer animation
Operator (mathematics)
Computer hardware
Order (biology)
Process (computing)
Units of measurement
Geometric quantization
Units of measurement
56:29
Dataflow
Multiplication sign
1 (number)
Virtual machine
Parameter (computer programming)
Mereology
Mathematical model
2 (number)
Goodness of fit
Tensor
Term (mathematics)
Software
Energy level
Subtraction
Library (computing)
Standard deviation
Algorithm
Multiplication
Product (category theory)
Graph (mathematics)
Dataflow
Block (periodic table)
Artificial neural network
Lemma (mathematics)
Moment (mathematics)
Open source
Virtual machine
10 (number)
Word
Computer animation
Visualization (computer graphics)
Logic
Website
Right angle
Quicksort
Data type
00:01
with the so this is the other 5 unintelligible talk I think you very much for coming to this sections of right after lunch and I know of you many of you are going to start getting sleepy right around the 40 In March 30 minute mark so do your best to stay awake all do my best to keep you awake so what's going to work together to get to the talk on so if the slides will invariants mind just to introduce myself my name is the lowest common I'm developer advocates that that at Google I worked
00:44
on the cool cloud platform genes so that kind of encompasses all rule cloud platform so few people or not you people but if
00:54
you you guys are familiar with some things like average in the work of a computer engineer that sort of thing the that's what we would call part from his and so on and on and on on Twitter and union Louis I I've been tweeting like source throughout the compensation deal finally fairly easily on on Twitter on and just a little bit more of a background about myself I based in Tokyo Japan was listed on for about 10 years and then I think it kind of
01:28
active in the the present community there as well so far and 1 of the the for people who kind of founded the
01:37
department at the conference which is about a 600 person conference I just to give you an idea of the size and going to be having the conference in September in the 3rd week of september I believe it's from the 20th to the 24th going on and if you look at my country the and you can find out register I think there are as of now something like 20 slots left so well so hurry up and also of prettier for young enthusiastic about other kind of communities so he could go community as well as the other opensource kind of projects communicable like like companies and and Dr. in this type of containerization and things so that's that type of thing that you cannot expect to hear from you if you follow on Twitter on so 1st of just is a kind of a background they wanted to go over a kind what deep learning is and I'm going to give a very high level not known as a high level but they sort of a quick overview of what there is like how many of you guys that went to the talks earlier than they know about the people learning to quite a few of you on Monday trying my best to to kind of build on that 1 but there may be a little bit of overlap so what are we talking about when we talk about it learning talking in terms of deeper learning we're talking about a specific type of machine learning which is using neural networks and neural networks or a way of doing machine learning where you build this kind of network of nodes of interconnected by so you
03:26
essentially give something like this this cat picture of you change the pixels into a a kind of numerical representation you pass that through as far as the input layer into the good networks and each of these nodes internal nodes will take the the internal the values from your inputs and some operation on them and eventually give you the the output
03:55
so these are typically organized in layers so you can see this blue 1 is the is the input
04:00
layer and the onetoone there is a is what's called a hidden layer of so if you think of a there is no network is kind of a black box and the hidden layers the layers that are actually inside do the operation and so each 1 of these little nodes this is that and there's some sort of operation on the on the inputs and that's called basically an activation function on and then each of these are kind of links together using a weighted connections so each of these little lines connecting the the layers will be weighted to indicate latest strength between each of the layer so
04:44
what are the little neural networks before so no essentially good for a kind of classification and regression problems so these are very wide class of problems that that you can apply machine learning to on so classication is basically putting things into buckets so you can have
05:01
like a bunch of predefined but it's like a B C and then you get something quite say which but it doesn't go in and then you basically put to the network and get a probability that goes in a B or C and regression is a little bit more somewhat more complicated in that you get instead of a probability that goes into a single bucket like this you know between 0 and 1 you get kind of a a scalar outputs so say you have or you are you get some values into neural network and outputs you wanna say like is the temperature so like from from you know say 0 kelvin to some value or some some temperature that's more like a scalar or that would be used could be solved by regression of talking mostly about about the classification problems but regression is also something that neural networks to prove that so what is was that actually look like that so here's a
06:02
a a little demo that available what playground that content flowed out word this is like a little demo that allows you to kind of look into a neural network and can get an idea of what's going on so here we have some input features so these are some values that you add the input into the network and then you have some hidden layers in the middle and then you get some sort of and so if you went to some of the earlier presentation this or something similar to this we have saved a on mines in London like say that you have like the weight and height of a person and then you have 2 different categories here like these ones ones are needy children and these blue ones or say adults so if you want to classify new pieces of data coming into here you never you could say save you training network to do this but this is really you know very easy to do like this is essentially a linear classification problem where a linear regression and classification problem where you can just draw a line in between the 2 ends gets a way of predicting between to predicting between it and let's say you have something bit more complicated let's look at 1 where the data 1 category is completely encircled by the others so if we were to do something like this and then tried to train using just some you know X x and y inputs this this we actually never basically never converges never figure out how to do this problem so we can do things like that as the the input layer and that will essentially do the the this kind of linear classification multiple times as they can say let's do this 1 time I will see that it basically creating 1 line here so everything on this side of it all classifiers lines in the side of classwise blue but then we add like new layers with not layers but new nodes we can actually see that it it gets a little bit more complicated so we could say it now figures out to lines for users to lines and then average the results together you see in the 1 node done 1 kind of linear regression and 1 noticed another and then when you combine together it kind of makes this band here and what's not just 1 that 1 now if we do with 3 we can actually combine the results 3 times and we get kind of a triangular Toeplitz structure select as we add this kind of nodes and and layers we can do things that are more and more complicated OK so that's great so late but how do we classify something that looks more like this so this is kind of a spiral looking training in these the spiral blue and this policy 1 inch this is something that's only quite a bit more difficult and we can't necessarily classified if using something weight just X and Y and so you can imagine the sign will be good or x squared but even still like we don't really get very very good output I just by having a very shallow kind of network with just 3 nodes so in order to actually make this a little bit more complicated or there to these more complicated problems so we need a much more complex than that so for something like this this may or may not actually converge on but it's getting there so so right now this is actually not terribly stable but it will stabilize like so once you if you have this kind of more complicated networks that you can kind of put together you can actually start solving more more complex problems and so all talk a little bit about why that's important a little bit later but you can see that each image of these individual nodes like has their own kind of In addition to this to the to the final outputs and then each of these little lines here we that show the weights for these blue ones are positive news 1 2 and the negative such that show was that the negative ones are actually inverse inverse relationship so With this inverse relationship you can essentially just you know would firstly the wanted a blue the theory get the right type of outputs but on the center you can have this kind of positive and negative weighted connections between the different nodes want to turn this off so it doesn't burn my CPU and then that year
10:50
of the way that that was basically just a way of getting to
10:53
understand no networks start actually using tens of fluid under the hood it's like all the new jobs the browser but from it's essentially a way of like kind of getting familiar with more familiar with the real networks so is a neural networks so neural network is essentially that when you break down is some essentially a pipeline of basically taking something like a matrix what is essentially called attention and putting it through like this pipeline of operations and so you can imagine that each of these is a like a matrix multiplication type of
11:31
problem over think of a function where you take of 1 matrix multiplied by another matrix multiply vinyl matrix and another and another and another and another and then eventually you get out of the tensor that represents the output and for a particular for your particular problem and this is basically very loosely modeled after house how the brain works
11:56
only in how the of the the individual nodes like have have with a kind of strength the weight that in between the neurons in your in your brain have a certain weight between but from the from a practical point of view you're essentially doing matrix operations on on and a bunch of times in order to do some sort of predictions assignments and what a tensor on and this is where the this is what were tens of all gets its name from but a tensor is not
12:30
something that people necessarily think of very often heard on encountering too often on unless you're a machine learning type of person but on most people are familiar with things like vectors and matrices and tensors essentially a generalized
12:45
version of that so you can imagine like this kind of like to be like
12:49
Euclidean space our 3 D space and then you have some sort of value out here in the space and so for something like a factor you know you would have like you know it to the vector hand that could be saved represented by a a single array in a programming language for a matrix which is a you know a 3 dimensional vector a twodimensional vector on but attention is essentially a generalized version where you have this ndimensional are type of active so you could have like
13:21
any number of dimensions of of dimensions from so this could be 1 dimension for each type of features that you're actually add into the into the into the network and you can essentially do the same sort of operations on the tensor as you would find a say a matrix so like matrix multiplication or matrix addition that's so hold the power of the signal network works is that you would have this kind of connected nodes so this is our
13:58
input vector input tensor with X 1 X 2 X 3 and then we have the weights on cancer word that is represented in that you can then multiplying and then finally we add the resultant result of biases in the form of a tensor and then softmax it's to get the out this is a very very basic 1 uh 1 layer network but you you can think of these individual not these individual on weights or whatever or not each matrix multiplications but essentially this this matrix times this matrix is this kind of interconnected makes this kind of interconnected time and so this is how likely the if you want to use only Audio talks most of the the operations are performed in this way where you have uh the input X times W which is the weights of plus the which is the biases and then you can do that multiple times for a for each layer and and so these are
15:11
basically just multiplications and additions and then we have
15:14
this kind of softmax thing at the end of the softmax is essentially just the form of a in way of normalizing the data you want to consult seals typically the these at the very end of the networks on so what happens is after you've gone through this network these outputs what it would be at this level is that you would have some sort of value like 50 this was like 50 isn't like 20 someplace . 3 2 you know and like so you don't really get an idea of like what that actually means that's kind of these values are kind of a relative value for your actual network so when you put this through the softmax function this is largely normalized it to a value between 1 and 0 I see
15:54
if you get essentially a prediction output so and then these these individual values would represent about whether the the percentage say of that a particular value goes into a particular part a this is a cat dog and this is the human
16:11
and we put in an image value the output might be like that it's . 9 9 % certain it's a cat and . 0 1 % that the dog in . 0 0 1 per cent are certain that to human which essentially means that the
16:32
so that's great so that's knowledge we
16:33
actually like prediction so let me say input inputs we go through all these operations and get some sort of predictive right but how we
16:41
actually train on model so model is trained in in this way where you have you use a method called back representation of which was part of some of the earlier talks on but essentially what you have is you have this this year is the neural network as we've been talking about before so here's like said 1 layer appears in the 2nd layer uses softmax and here's our help with 1 and we actually go through here and we do the prediction on but then what we do is we use we use test data to actually as we put it through our our network so we have some test it says here's the here's the actual data user the capture of this is a catch so you have the actual value of of the actual
17:23
output expected outputs and the actual on the test data of associated with each other so you know which ones are caricatures which when the dog pictures and and so what we do is
17:35
to put this say that capital through here and then it comes out with it is with the result and what we do is we take that result and we and the expected value and then we find essentially the of the difference between those 2 value so safe if it came out that it was you know . 86
17:57
% certain that was cat but we know that it's a hundred per cent certain it's a we want to be able to knowledge of our network in the direction of actually determining with 100 and accuracy there's a capsule figure this and use what's called a loss function to
18:16
find the differences so selected typical loss function might the cross entropy but there are a number of other loss functions that you can use depending on you the situation and then you go through and these are the new kind of optimizing results by using something like gradient descent is were also talked about a little bit earlier on multiple innovation more in general about these kind of optimization or is especially gradient descent on but essentially what you do is you put through this this optimization of function and then back propagate all the values into the weights and biases for each individual layer so these these this way 1 way to bias 1 way to invite to are actually the weights and biases from their use in the network here and so what you're doing here is your essentially backpropagating of these values and updating the weights and biases well and so on kind of knowledge in the network in the direction of actually doing doing is giving you the
19:21
proper help and then you do this that essentially many many many times you training it over and over and over and over again and it is eventually nodes is in the direction of of the a very but actually at least that's the that's the theory so this doesn't always work like that but in general that's the idea behind such kind of like of relative overview of how the what neural networks are like so why are we actually talking about this so you know 1 of the earlier talks mentions and things like that so these like image next is a you know a famous open datasets also offered for machine learning and we get like say like 25 per cent error rate in like 2002 on but so essentially the reason why we're talking about the standard did not all sudden is because of people have started to get very much better at the end trading these neural networks on and this is because of a number of kind of breakthroughs in terms of training these networks to do things that are actually practically useful so you can think of the the the quality of
20:46
the of the neural network kind of like this so I kind of traditional deeper traditional learning algorithms we kind of as you get more data would kind of increase in performance kind of level off very quickly and then you would have like smaller small they're elaborates which also kind of level off quickly and so essentially what people did was they would you know training the amount of data to about here for a given of certain amount data to right here and then they would basically yeah they wouldn't be adding more data would
21:20
actually make it much better so they will essentially be able to stop right here but we have since found these kind
21:27
of these you know network methods that allow us to scale the the learning much better so as we throw more data and the problem they actually get quite a bit more sophisticated and have quite a bit better performance so we've been able to create these large deep neural
21:45
networks there will continually get better as we give it more and more data and with that comes like other problems which I'll talk about in a 2nd but essentially these these light medium and large real networks have become possible recently and so
22:10
here is a model of what's the this is a global the grown on the network on that was used in this is called this is essential in such a model that was trained on the woman and so what this is essentially doing it's like labeling vectors labeling images you can think it is if 1 of these has been say a matrix multiplication or some sort of operation on a matrix and it goes through several different kind of layers and then eventually gives you are how tensor that tells you the labels so this is what we mean when we talk about deep now networks of networks that are essentially have like many many many many
22:56
layers before they actually give you this output and by adding these layers we actually can start getting more and more complicated you solving more more complicated problems and actually getting a pretty good results about with but this this is a problem where we have you know you can imagine that each 1 of these is a matrix multiplication and these cancers might be you know a large image like a megabyte or something and you're changing that in the tensor and then doing a matrix multiplication on it you can imagine how many actual new operations you have to do In order to actually train this work to do even prediction even just once so you have to do this many many many many times over in actually trying networks and so what people do is these GPU is on and these 2 views of
23:45
very good at high priority but still you're essentially waiting for like 2 weeks or sometimes even months for the results of actually 1 1 single training right so what people started in the late eighties like supercomputers In order to train models faster on but still this is a problem because not everybody has access to supercomputer how many of you guys have access to supercomputer somebody does that's that's most I've ever seen those like 3 or 4 things on so how much the patient
24:19
had away so there is this arrow supercomputers there or something that you have the least time on little like the older mainframes of all the you know where you had the likely sometimes you know 738 and you know in the middle of the nite or something like that and you take tons of money for them so they're not exactly the easiest the best way and we want like you know the ideal thing is to deal for everybody to be a little machine learning so what you need is a kind of distributed kind of training on and so they actually been able to do that and so we use it for a lot of practical applications
24:55
things like and photos and like detecting text in street view images on so there's a lot of kind of exciting things that are going on and essentially recently we've these kind of refusal on quite a lot of activity and Google as this is a number of other projects the internally will then used by that use learning machine learning this is just the number of directories that contain a model description file competency from 2004 we got this kind of practice the growth and
25:31
yellow by distributing it we've been able to do you know much much faster except book so now I'd
25:38
like to talk about and floor itself and so flow is an
25:42
opensource library it's a generic very general purpose machine learning library for particularly for doing all networks we're also and expanding its too encompass other
25:55
types of machine learning on but said it was open source environment has applied on energy is buying internally and will for a lot of our internal projects so it supports a number of things like you know are you know this kind of
26:14
flexible and intuitive construction no to basically be able to do a lot of things or in an automated way on and you can see it supports training on things like CPU and GPU successor on 1 of the nice things that you define this kind of networks in Python so far before I kind
26:35
of to looking at what intervals like some of the core concepts you have a graph so tender flows the tuning of tens flow comes from the idea of like taking chances and having them close to a flow graph of the directed flows had the graph
26:52
so it's a graph is the representation of that these are the operations of the actual nodes of the the operations that you do
26:59
I intensity the data that actually passed through the through the on the network and then we have other types of kind of structure so we have these like the idea of these constants which can be something that doesn't change on
27:15
but then you have things like placeholders these are basically inputs into our into our network these these variables some variables of things that actually change during the training so these are the things that you usually use for your weights and biases etc. and session is something that actually encapsulates the on the overall make connection between tens of score and how you actually the models that you find so I should mention that tend to flow
27:52
is a library that is based on the same sort of concepts of many other libraries
27:58
consigned to libraries we have a part of Python interface EPI and then it has a kind of a C + + horror that enables you to do these kind of very fast operation so we are actually during training you don't you're not actually going through the pipe and the so these are you know nonexclusive
28:21
area non exclusive list of all the operations you can do with tens of flows of things like math addition subtraction multiplication division of of these tensors matrix operations stateful kind of operations experiments so
28:41
let's actually what this looks
28:44
like so when you run through so this
28:54
is this is a stupid notebook so how many people have heard of Jupiter
28:58
use Jupiter 0 OK how many people have been asked that more than 5 times this conference on river on so is just assume that you guys even just kind of go from there let me actually just we
29:16
start this kernel here yeah yes this is a python 2 1 because tens of flow also supports Python 3 if I remember right but this particular example Python T and so on of flow is is pretty easy to get started there's like this is this is just using the as a kind of amnesty examples of what the so Mr. mission machine was talking about the began this example earlier today raise essentially a bunch of images that are kind kind of handwritten numbers and you dispose the artist to determine which type of which number is actually present in the image so the training images look something like this we have 55 thousand images and they're all in this species long a ready of and each 1 has 784 pixels and they're basically models monochrome selective just
30:21
black and of but the the and
30:26
so if you look at the shape of that unit the 55 thousand the size range on with the 784 pixels but if you look at the face of the images they're essentially each value in it is the this is each 1 of the images in Europe linear kind of a twodimensional array and each of these values is a value from 0 to 1 of those essentially how . that particular pixel is so some of these like . 2 3 which is kind of a white bread all the way up to 1 so that's
31:03
essentially what the data looks like so
31:06
that's how we've actually represented here like if you had a color energy we need to represent a little bit differently but that's essentially how we're doing it in this case and then this is just using this is just showing an example value so using that that plot line so this is just 1 of the input images so that's essentially with the training data looks like but then we have these training labels that are associated with each image that says that's basically 10 you know or a or a vector of size 10 with you know a bunch of zeros in it and a 1 in the right location that's that indicates the number of the for that particular image so for this image we have a page here so if we look at it between labels the shape of that that's 10 10 size and then if we look at this the particle 1 for this case we can see that the 1 is in the of what is this is like 0 to 9 or something in this particular Thompson each column I think this is like this 0 free is actually pretty sets from 0 to 9 so that's that's essentially what this is these actually 100 actors in where you have 0 in all the values except for 1 this is used often in training data but the data that should get out of it is actually
32:36
similar today's this except for it will be a bunch of values from 0 to 1 essentially a probability and so here's some of
32:46
the images by itself as some images of their connection earlier on but so wanted training and you can kind of get these so you can train it to show these different of set these weights and biases so that make individual pixels like will indicating whether it's a the particular number so in this case we're actually using a very simple no
33:12
network which will kind of which
33:14
with just 1 layer which will work this way but essentially give you see pixels in these blue areas that probably is 0 and if it's in the if you if there's any pixels in here in the red area that it's probably not as 0 and then it's like basically aggregates the the probabilities whether this is 0 and you can kind of see that in in many the other ones so like this onetoone you see pixels in this area with it to you it's like in this area and 3 in this area so they look similar to the actual value the numbers you looking for so I actually don't think so the next 1 and this is actually as defining our of our networks so here were defined importing tens of flow reason the placeholder that I talked about earlier this is our input into the In into the neural network and we had it is the size of 784 so this is the size of the number of pixels and then we have these weights and biases as variables which can be updated as we train the model on in here is actually where we define our of our network so here is this is just the same single layer network for doing you can define its very similarly in Python to the way that you would do in
34:35
mathematics so here we can say were doing a matrix
34:39
multiplication on the inputs times the weight and adding it to the bias of variable and then doing a softmax on it at the end and then kind the flow internally will take these and build our kind of of
34:54
our like data flow diagrams data flow you can model representation so once we have that out we can then use that to you today in the train model
35:08
so this is actually our our our neural network and then we have a placeholder for the outputs this why prime and we've defined a cross entropy function here is our loss function and then we can basically plot of pretty into this gradient descent optimize and optimizing using cross entropy and this will then create our kind of training step so this isn't this encompasses the entire of the entire no number plus the training that we need to do and as grid like like the some of the other explanations from the talks gradient descent is essentially a way of kind of nudging our neural network in the direction that we wanted to so and I think 1 of the the talks I talked about using going down and not I'm using a single little you flash later a
36:02
torch and then kind of just going a little bit at a time I don't know of essentially that's the idea you're essentially going down and moving it in the
36:10
direction toward Towards a in minimum to actually minimize the loss so that the same that this altitude would be the the error the loss generated by the loss function and then we basically noted in the direction where the lost due to minimize and then essentially do that just over and over and over again so each 1 of these of the training epoch as we're going down the drain descent optimizes so here we're going to but we're going to train a thousand times on a particular piece of data and so what's great is that we can also do this kind of minibatch training which is on way of you basically take just a small subset of the total
36:59
training data so we're not actually training everything'll time on every single piece of the training data we're actually training on a random randomly selected facts of this case 100 I think over here on elements and we're doing that essentially a thousand times so this is the thing we longer than usual on and so that what's good
37:22
about that is that you have to train on the entire training data you can essentially do something that's you basically take a randomized in a subset of the training data and that's essentially the same thing you do
37:36
when you do like say a statistical survey we asked a 1 2 people so you basically get a is this significantly are the a a representative sample of the data OK so this is actually done
37:53
OK so the answer this I've actually gone through the training and then at the end we can we can actually check the accuracy of our neural networks so this case actually got about 90 per cent which is pretty bad but about so this is a very simple like 1 layer neural network so that's essentially kind of how you can if you use tens of flow you can basically create these the steps to to run through it but all these steps are actually word of the actual computation is done under the hood in part of this if you're in the year of the c plus plus score and that's and have also maps on to it not saturated devices so if you have to views CP is available watching
38:43
map the operations to those particular devices so in this case I'm writing this I might say like I think that
38:49
32 core machine so actually map that so I'll talk a little bit about 10 to the 1 liter let me go back can I go back this effect on the back this is not what it's not the back the
39:17
only thing OK so here I'm going to
39:23
look at a little bit more a little bit more complicated example and where if you get a little bit better accuracy and so forth during summer training really using the same exactly that we did before on both organ actually used to build what's called a convolutional network and this is this is part of a bit about earlier on and so this is this basically allows you to to look at the image like country in part on and basically pick the specific features from
39:59
each part of the image and this helps with things like like say you right the way that I had earlier you know you have the the image and you have like certain if use all pixels in a certain location then that would indicate what number it was about what happens if you write this the the this year or whatever but you actually translated it slightly over a little bit of that would actually change the way that the you know than at that particular network would be very good at figuring out there that I just moved this 0 over a few pixels instead so to stuff like that in this country of convolutional of
40:36
looking at it helps a lot but In this case what we're gonna do is we're going to initialize the weights and biases a little bit different I think this 1 is just doing this kind of a I think this was taking like kind of random random weights to begin with on but here's the kind of conclusion part so essentially what we're doing is we're going over the image and were picking of a particular these are the the 1 with the work of kernels guess over the image and then were kind of building this or the value of this other kind of 10 that indicates that has a particular value for each of these here for each of the of the picture kernels over that of the image and then we can actually work on the this is just picking kind of features of each individual part of the image rather than looking at the whole image or the image as a whole and then we can take I think that the same things and use what's called cooling calling is another kind of the method that you use to basically kind of 1 of the most common examples maxpooling where you take the the individual value from a part of the of the tensor and you pick the maximum value this kind of like this you somewhat of a representation of a particular part of the image as well and of for that altogether into into layer and you can do that like traits several layers of like that look like that security we set full Our 1st convolutional layer by building these spaces the building the weights and biases and then building our of layer here and then the 2nd convolutional layer takes the inputs from the the outputs from the previous layer and thus the same basically the same sort of thing and then at the end we created a this is a densely connected layers so as the sum the previous talks developed with the convolutional layer is not the kind of and connected between the values because they're actually using this kind of translated from all over the image but the final output layer is kind of a densely connected layer which allows you to kind of just a few basically the exact same thing that we did in our previous layer what we're just we don't have we don't have the
43:12
convolutional part and the
43:17
allow us to you get a much better and companies and not really talk about dropped and but and then we have basically the RealPlayer and this is essentially just doing the softmax on the outputs of all of last part of the over last output from the previous layer and then you can kind of tree next to the model this in this particular 1 we're doing this using the same kind of crossentropy of using Adam optimizer instead of a gradient of the regular gradient descent optimizer and then with those kind of optimization you can kind of get a a much better outputs were a much better performance so here we're actually doing a lot more training on this particular 1 because it's on a deeper network and we can we can train
44:12
iterates feels a lot better or previous 1 if we continue to train him 1 more it probably wouldn't get wouldn't get very much better than 90 per cent of in this case we can train it quite a lot more times but in order to improve the accuracy so actually at about
44:27
20 thousand times on mini batches of 50 and so they will go through this is actually doing this because it's a this takes about 5 minutes or something but to actually run through all that you see from the output can we get about 99 . 2 % accuracy which is a good deal better than 90 90 % right
44:48
so it's a 1 in 10 you know it's like around 1 in a hundred years is is classified incorrectly so you can do things from from very simple networks to more cut much more complex networks so let me go back
45:08
to my so 1 of the other things that you
45:17
can do because flow has this kind of internal representation knowledge about all the graphs
45:22
and everything is working together
45:24
is you essentially a lot like right log output files as as you during training and these can then be read by a by an application called Cancer bore bored so we were obviously very unique nice things the know tend to float into board is here but what this is really what's really cool about this is that we see with this is that you can look at the the on things like the accuracy of the values of the loss functions and
46:04
this look these kind of grasses that as you were training going over the data to kind of see how
46:10
neural network is performing so in this case we're seeing the actual accuracy as were training and so this is 1 of the this is I think on it in or on the this simple version so once we get up to about 90 % we get there pretty quickly but we don't really get very much better as the training data but you can look at like things like the accuracy but you can also look at the actual loss functions this is cross entropy looking at the cross entropy value and that kind of goes down and down and down this should actually be the close the inverse of the and the accuracy but you can also look at many of the other values and this these basically corresponds to the to the variables or the the the
46:54
individual parts of your other than the basically the the values that you have so here crossentropy was an actual object of Python objects that you can use word that was defined and
47:09
then you can get this kind of log output data so other things like the maximum in and stuff like that are also part of the per cent would the these early kind of
47:24
input images that you can look at but 1 of the other cool things that you can actually look at the graph of the data itself so all of the the the model itself that 2 buildings to here we have a 2 layer so if we have a 2 layer network we can actually just kind of like zoom in and look at the individual pieces the network like the weights and biases and things like that for for individual parts the network and look at things like the dropdown values the the loss function as well so like this this basically gives you from the Python code that we wrote I will give you a full kind of graph representation of the order of the network so that in the case of say something like a very complicated you know the image that thing that I was showing you earlier you know you would see this huge huge graph of 4 and that was generated by
48:21
that but this is really cool because it helps you visualize your of your neural network which is you define the let's go back here have about 10 minutes or so left something like yellow will get there so that is the main difference between this between distributed training and surrogate between
48:54
tend fall in any of the other line of libraries that are out there is that tens of flow was built from the the from day 1 with distributed training in line so essentially change flows built in such a way that we wanted the to productionised cost you actually do practical work
49:13
with our with our with our while with the library with our networks so we want to be able to do to train these faster on and based on the kind of like hardware kind breakthroughs in improvements that we've done in the past we've made in the past we want the older use was those to be able to train models fast attend flow of supports multiple different titles of parallelism some old model
49:40
parallelism which is essentially breaking up the model so each 1 of these machines that takes a different part of the model and you basically feeding through through here I and basically break up the work that way but it
49:57
also supports what's called data parallelism which is
50:00
hopefully and 1 of the slides it
50:03
disappeared so data parallelism is the opposite where you basically break up the data instead what each
50:09
1 of the of the of the machines has a full copy of the model so you basically splitting up the like record 1 through a hundred and sending it to 1 machine and you know 101 through 200 to different machine and then breaking up the model the values of the of the work that way and there's a number of kind of trade between these whether you
50:32
use like you do like a full graph or a subgraph of the model parallelism more synchronous or asynchronous data parallelism this is going to help her like
50:43
yeah there's a kind of these pluses and minuses to each of these and
50:50
so that's kind of there's no like
50:52
similar to this but
50:55
I do know that in will that we use premeditated parallels the use data and pretty much exclusively so terms of flow
51:05
basically supporting a number of ways this these different types of model parallelism
51:11
and and so on they and
51:16
sedated roles and is this 1 OK so this is where you take the data you can split up and each 1 of these replicas has a full has the entire model and then once you've done some training you can pass this to the printer server so this is the thing that holds all the way to the biases so these are updated dull in push this back to the model replicas and then there's like kind of asynchronous and synchronous versions of the appeals and where you're updating the model updating the weights and biases in parallel for your operating under synchronously for
51:49
each of you kind of iteration on this increases is much
51:55
faster but I can kind of added some of some noise here to model because these the parameters account changed midway through can be changed midway through a run whereas in
52:12
synchronization you kind of running the split up data and then you wait until all the models have finished a particular as a free going next 1 and reduce it but will actually make that make it a little further make the training of a
52:25
bit slower so this is a kind of an example of how that would run with tend to flow we have a bunch of workers doing the paralyzing and then you have some kind of parameter servers and then in between the service they at least yeah PC to to communicate
52:44
so why is this kind of data
52:45
parallelism important on OK so let's a that instead of extracting a cat out of on
52:51
your network we got a dog and relate well we want improve our network we want to make that so Our where do
52:59
reflects what we in order to make that actually better I don't know maybe this this is probably a good idea I don't know so we
53:08
do that we make tree can be run this again and we're like OK yeah like this is right nice and it's
53:14
like running on my GPU and on and it comes out it's like it doesn't make it better like track where right right now like ligand back and start over again so you normally run these kind of experiments like you want all the run these experiments over and over again like very quickly you don't wanna have to wait a week in order figure out that your Tweet went well enough and this is this is a problem with with people who are even experts in machine learning is like you basically you have your have experience and you have a literature that you can you can use to kind of figure out you can narrow down what things you might want to tweak but in the end you need to be able to run back and test to see if the data or if the actual tweet that you may improve the doesn't then improves event essentially have to test and this takes time and so that's why it's very important to be able to do this kind of distributed training but when
54:12
the problems is that as you scale the number of nodes like these number of connections the number of connections between your parameters servers in between your workers increases like kind of exponentially and so this doesn't essentially scale you essentially bottleneck on the network
54:26
on because these these guys are talking over TCP and you essentially get kind of like you know enough on the order of milliseconds latency between the between the machines so you essentially need to build this kind of like you need to have like a dedicated network or a kind of dedicated harbor network a lot of people use things like if it abandoned or whatever on In order to make this go faster but this is actually something that really the problem at the moment so 1 of the things that we did Google's like we're releasing cardinality internally what we do is we created we do art history training but we have understood a dedicated network that doesn't use TCP IP and basically skips the whole TCP IP stack and is able to to me have the communication between the the machines run on the order of you of MS Word nanoseconds instead of 2 nanoseconds or microseconds instead of instead milliseconds so this is something that we are planning on making the public as of
55:36
what's called harden which allow you basically run tens of photographs on the inside of the real data
55:44
roles of planning in finally exposing as far as the the idea on dedicated hardware that we use using force on some incentive using these with these these are called tens of processing units I think that what they're calling but essentially what they are at they're they're dedicated hardware used for doing tensor operations so so we're basically being able to like expose those and but to 2 other people so that they can use use that kind of dedicated hardware in order to like kind of do more experience and things like that
56:19
so I think that's all I had on so I want to thank you for coming and spending you know the last hour here that how many of you still awake region hands she so
56:33
about like 70 % you so let us things a lot for coming out definitely check out the tens of flow . org website there's tons of really good examples
56:45
like if you go here and then if you
56:48
look here there's like tutorials and documentation this is actually really really good and has like lot of good examples about how to 100 to use Houston's and especially if you're also a person you know there's different ones for different levels of people as well as how to use in terms of clustering to kind of
57:08
move towards actually production rising you're your models so thanks a lot for coming and and and few yeah Jewish question that we have like 2 minutes so that works all yes sir question has to develop 15 seconds and then you know maybe 30 seconds Is there anything like profiling for this kind of models like do that have an overview of how many multiplication how many parameters that does each block of the flowgraph needs so usually involve like actually time it took to to run it I don't know which if tens of or gives you that think that it probably should if it doesn't uh I don't know is that the end of that actually is but I think that that could be something that you could visualize sense of word as part of the out you basically logic that is as a value that you can view here and have work and then kind of see that you know how each part of the graph performed things like that other questions . 1 right behind you the so the previous talks today mention that you typically have to do some feature extraction before you can actually apply neural networks and I will censor flow Help me speedup my were manually designed to feature extraction or is it that are designed only to do neural networks stuff so at the moment it's mostly geared towards knowledge I mean obviously did you like feature extraction using a separate neural networks so you could do like neural network that does the retraction and another 1 that does the actual like classification let's there is there is some work going on there's like forget what it's called it's like like tens of flow wide something like that I think it's called itself is essentially instead of like having deep neural networks and the idea is that you have these like kind of more standard type of machine learning algorithms and so I think that there is work going on there too you like incorporate more standard machine learning algorithms you can do that sort of feature extraction beforehand and stuff like that but it's kind of ongoing work you might try and search about Center white I haven't played with it personally so I can't really give you details yes thanks a lot