A Gentle Introduction to Neural Networks (with Python)
Formal Metadata
Title 
A Gentle Introduction to Neural Networks (with Python)

Title of Series  
Part Number 
165

Number of Parts 
169

Author 

License 
CC Attribution  NonCommercial  ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and noncommercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license. 
Identifiers 

Publisher 

Release Date 
2016

Language 
English

Content Metadata
Subject Area  
Abstract 
Tariq Rashid  A Gentle Introduction to Neural Networks (with Python) A gentle introduction to neural networks, and making your own with Python. This session is deliberately designed to be accessible to everyone, including anyone with no expertise in mathematics, computer science or Python. From this session you will have an intuitive understanding of what neural networks are and how they work. If you are more technically capable, you will see how you could make your own with Python and numpy.  Part 1  Ideas:  the search for AI, hard problems for computers easy fro humans  learning from examples (simple classifier)  biologically inspired neurons and networks  training a neural network  the back propagation breakthrough  matrix ways of working (good for computers) Part 2  Python:  Python is easy, and everywhere  Python notebooks  the MNIST data set  a very simple neural network class  focus on concise and efficient matrix calculations with bumpy  97.5% accuracy recognising handwritten numbers  with just a few lines of code!

00:00
Goodness of fit
Open source
Artificial neural network
Term (mathematics)
Selforganization
Data conversion
Open set
Rule of inference
01:05
Computer animation
Artificial neural network
Code
Multiplication sign
Expert system
Right angle
Computational intelligence
01:53
Point (geometry)
Computer animation
Artificial neural network
Moment (mathematics)
Bit
02:31
Pointer (computer programming)
Computer animation
Demo (music)
Artificial neural network
Maxima and minima
Number
Library (computing)
03:11
Point (geometry)
Digital photography
Computer animation
Set (mathematics)
Right angle
Frame problem
Number
03:45
Digital photography
Digital photography
Word
Computer animation
Artificial neural network
Sampling (statistics)
Right angle
Audio file format
Quicksort
Computational intelligence
04:41
Googol
Computer animation
Lecture/Conference
Artificial neural network
Moment (mathematics)
Artificial neural network
Process (computing)
Port scanner
Mereology
Game theory
Valueadded network
Hypercube
05:18
Programmer (hardware)
Word
Computer animation
Artificial neural network
Calculation
output
Computational intelligence
05:56
Data model
Computer animation
Multiplication sign
Real number
Electronic mailing list
Parameter (computer programming)
Data conversion
Endliche Modelltheorie
Error message
Number
06:54
Area
Group action
Computer animation
Artificial neural network
Electronic mailing list
Virtual machine
Electronic program guide
Circle
Endliche Modelltheorie
Parameter (computer programming)
Error message
Wave packet
08:00
Point (geometry)
Data model
Computer animation
Lecture/Conference
Point (geometry)
Knot
Key (cryptography)
Endliche Modelltheorie
Error message
Mathematical model
08:34
Group action
Coccinellidae
Cliquewidth
Computer animation
Length
Gene cluster
Continuous track
Parameter (computer programming)
09:17
Predictability
Type theory
Software bug
Word
Goodness of fit
Computer animation
Gene cluster
Continuous track
Line (geometry)
Object (grammar)
Flow separation
10:20
Software bug
Continuous track
Computer animation
Continuous track
Endliche Modelltheorie
Spacetime
11:04
Point (geometry)
Cliquewidth
Inheritance (objectoriented programming)
Length
Gradient
Parameter (computer programming)
Mass
Line (geometry)
Parameter (computer programming)
Linear algebra
Flow separation
Mathematics
Process (computing)
Computer animation
12:53
Point (geometry)
Goodness of fit
Bit rate
Divisor
Lecture/Conference
Direction (geometry)
Bit
Line (geometry)
Error message
13:30
Theory of relativity
Link (knot theory)
Divisor
Artificial neural network
Special unitary group
Bit
Port scanner
Computational complexity theory
Computer animation
Calculation
output
Point cloud
Endliche Modelltheorie
Condition number
15:01
Computer animation
Causality
Link (knot theory)
Lecture/Conference
Green's function
Data recovery
output
Function (mathematics)
Quicksort
Line (geometry)
15:47
Complexity class
Point (geometry)
Laptop
Multiplication
Shift operator
Algorithm
Dataflow
Artificial neural network
Software developer
Bit
Line (geometry)
Variable (mathematics)
Computational intelligence
Digital photography
Computer animation
Natural number
Linearization
output
Right angle
Key (cryptography)
Quicksort
18:21
Point (geometry)
Computer animation
Term (mathematics)
Natural number
Online help
Data structure
Species
Quicksort
Port scanner
Task (computing)
19:35
Point (geometry)
Functional (mathematics)
Polygon mesh
Artificial neural network
Multiplication sign
Artificial neural network
Combinational logic
Sound effect
Computational intelligence
Thresholding (image processing)
Element (mathematics)
Number
Sigmoid function
Message passing
Computer animation
Natural number
Function (mathematics)
output
Circle
Software testing
Quicksort
output
Physical system
21:59
Functional (mathematics)
Link (knot theory)
Artificial neural network
Weight
Sound effect
Control flow
Line (geometry)
Function (mathematics)
Parameter (computer programming)
Density of states
Thresholding (image processing)
Connected space
Computer animation
output
Coefficient
23:19
Point (geometry)
Computer animation
Link (knot theory)
Artificial neural network
Multiplication sign
Freezing
Connected space
24:05
Computer animation
Link (knot theory)
Artificial neural network
Function (mathematics)
Multiplication sign
Calculation
Weight
Summierbarkeit
Sequence
Graph coloring
24:40
Multiplication
Computer animation
Artificial neural network
Multiplication sign
Weight
Calculation
Matrix (mathematics)
output
Pattern language
Matrix (mathematics)
Number
25:28
Web page
Graphics processing unit
Multiplication
Artificial neural network
Multiplication sign
Computational intelligence
Number
Term (mathematics)
Calculation
Computer hardware
Matrix (mathematics)
output
Right angle
Writing
Library (computing)
26:15
Computer animation
Lecture/Conference
Artificial neural network
Function (mathematics)
Artificial neural network
Matrix (mathematics)
Control flow
Endliche Modelltheorie
Parameter (computer programming)
Error message
Matrix (mathematics)
26:58
Area
Computer animation
Link (knot theory)
Artificial neural network
Function (mathematics)
Weight
Order (biology)
Artificial neural network
Bit
Parameter (computer programming)
Error message
Tunis
27:53
Area
Point (geometry)
Multiplication
Link (knot theory)
Artificial neural network
Direction (geometry)
Weight
Variable (mathematics)
BackpropagationAlgorithmus
Flow separation
Fraction (mathematics)
Word
Computer animation
Propagation of uncertainty
Term (mathematics)
Calculation
String (computer science)
Matrix (mathematics)
Heuristic
Right angle
Error message
Matrix (mathematics)
Spacetime
30:10
Quantum entanglement
Computer animation
Link (knot theory)
Function (mathematics)
Weight
Point (geometry)
Bit
Key (cryptography)
Function (mathematics)
Error message
Matrix (mathematics)
31:02
Area
Metropolitan area network
Functional (mathematics)
Computer animation
Lecture/Conference
Gradient
Maxima and minima
Function (mathematics)
GauĆFehlerintegral
Linear algebra
Quicksort
31:47
Point (geometry)
Software bug
Functional (mathematics)
Computer animation
Mapping
Direction (geometry)
Maxima and minima
Arm
32:29
Point (geometry)
Functional (mathematics)
Artificial neural network
Gradient
Direction (geometry)
Point (geometry)
Gradient
Maxima and minima
Mass
Loop (music)
Computer animation
Key (cryptography)
GauĆFehlerintegral
33:16
Computer animation
Lecture/Conference
Weight
Gradient
Expression
Artificial neural network
Maxima and minima
GauĆFehlerintegral
Error message
Gradient descent
34:04
Functional (mathematics)
Code
Gradient
Weight
Multiplication sign
Gradient
Physical law
Expert system
Bit
Bit rate
Calculus
Complexity class
Computer animation
Blog
Pressure
Chain rule
Router (computing)
35:00
Metropolitan area network
Programmer (hardware)
Computer animation
Artificial neural network
Weight
Data structure
Initial value problem
Complexity class
Router (computing)
Social class
Library (computing)
Mach's principle
35:39
Curve
Multiplication
Functional (mathematics)
Graph (mathematics)
Numeral (linguistics)
Computer animation
Multiplication sign
Matrix (mathematics)
Summierbarkeit
System call
Library (computing)
Number
36:31
Curve
Functional (mathematics)
Multiplication
Artificial neural network
Mountain pass
Consistency
Weight
Tournament (medieval)
Electronic mailing list
Expert system
Function (mathematics)
Wave packet
Number
Computer animation
Term (mathematics)
Function (mathematics)
Calculation
Matrix (mathematics)
output
output
Form (programming)
37:34
Multiplication
Code
Artificial neural network
Mountain pass
Weight
Expression
Set (mathematics)
Calculus
Wave packet
Arm
Wave packet
Computer animation
Function (mathematics)
Matrix (mathematics)
output
Error message
38:37
Metropolitan area network
Pixel
Artificial neural network
State of matter
Function (mathematics)
Open set
Discrete element method
Number
Wave packet
Medical imaging
Computer animation
Testdaten
Function (mathematics)
Software testing
Quicksort
Freeware
Mathematical optimization
Resultant
40:18
Stochastic process
Metropolitan area network
Arm
Artificial neural network
Code
Weight
Multiplication sign
Graph (mathematics)
Water vapor
Maxima and minima
Line (geometry)
Discrete element method
Wave packet
Number
Mach's principle
Computer animation
Bit rate
Resultant
Gradient descent
41:29
Area
Digital electronics
Computer animation
Blog
Tape drive
Quicksort
42:04
Computer animation
Demo (music)
Artificial neural network
Digitizing
Computergenerated imagery
Personal area network
Right angle
Number
Wave packet
42:55
Metropolitan area network
Computer animation
Lecture/Conference
Computergenerated imagery
Personal area network
43:35
Medical imaging
Pixel
Computer animation
Divisor
Artificial neural network
Different (Kate Ryan album)
Computergenerated imagery
output
Graph coloring
Number
Alpha (investment)
44:55
Multiplication sign
00:01
good morning everyone and so this is the last year Python and in this session we
00:06
are actually dealing with some very hot topics uh that I'm really interested in them that's neural networks and deep learning and to begin with we have our 1st speaker dealership uh he is given a lock on a gentle introduction to neural networks so please give a kind of a plus few
00:30
the rule can you hear me this is what the you
00:32
know who might as well thank you very much for and coming to my talk and for me it's really great to be term open community opensource conference I always learn a lot actually I'm always of great conversations and is always a very generous spirit so on thank everyone and also the organizers so nice 1 and and you talk about some neural networks and I just want to be really clear and my talk is very and tends to be very introductory itself and this so people who perhaps don't know what neural
01:06
networks are or how they work well maybe you study them along time ago and forgotten so that if you already know what they always uridine experts you might be bored so don't mind if
01:16
you if you want to go to another talk co at at and my name surrogacy of let's say less and I am 1 of the coorganizers of London Python and and if you want to come and do something with this and please come along and have a chat with me we really wanted you more so broad things everything from computer art to and
01:36
teaching people to code right yeah yeah and cool so on as is there this is an
01:55
introductory talk and we'll talk a little bit about the background of what is artificial intelligence why there lot interest in your notes moment and then will get into
02:04
the ideas and that's the the me to this talk really it's what are the concepts that are used and in neural networks and what all you do is all all you some very simple common
02:16
audio examples which which may not seem that interesting but they illustrate the very key points that make you know its work so I hope that you stick with them
02:26
and that will help us understand what's going on inside a neural network and
02:31
will also apply them and I'll give you an example of applying your networks to quite interesting and and challenge recognizing handwritten numbers and I'll give some pointers on how you might code and and I might regret this kind of a live demo at the end and and if it goes wrong
02:51
that's going to be really embarrassing but but but give it a go and I'm not going talk about and libraries and there's lots of cool stuff out there there's the I there's tons of road and and this double to talks today and covering things like that someone is really about the concepts what's really going on and how you might do right so
03:13
just to get us into and the con the right frame of mind and from stalwart 2 questions so I have a
03:19
7 year old daughter and I and she likes challenges so a set of this challenges said can you look at this picture and point out where the people up and and as a 7 year old child she found that quite exciting very easy they like 7 year children and she can't that that that the people in the picture and that was that's fine and she's she can do numbers you can actually subtract so so said can you add
03:43
those numbers and then she
03:45
found that very difficult but you know and that with coding with computers with Python and doing the calculations such as the 1 on the right is actually very easy but in structuring the computer to find people in the photo is not so easy so that's
04:03
interesting that's easy for us and that's easy to code for computers but that's hard to get computers and
04:12
that's easy so there's something there and we would like to build to solve these kinds of problems interesting now find me a picture of a cat and work out what this sound sample what the words are in this sort of audio file you know that those are really interesting problems and we want to more of them and you know the terminology like artificial intelligence means different things to different people so for me it means being up to solve the kinds of problems that traditionally have not been found
04:42
that straightforward so that's what that's what this is about and or hyper moments because there's lots
04:49
of them stuff going on that you know you want to miss this autonomous cars this health data being used to improve on and outcomes and Google's been very kind of active recently with and they're being able to play
05:03
go which which is amazing we thought that it would take another so 20 years and they used in neural networks as part of their solution so that that in the interest people and
05:14
and that's what we'll talk about it so
05:18
let's go right back to the beginning in a really really assuming the so we want to ask the computer a question and we want an answer and it's produce some kind
05:30
of thinking but clearly can't think it's a it's just in a metal and wires and so has to
05:36
calculate and that's the has to process and those words I guess programmers like ourselves understand we have inputs we have some kind of calculation and we have adequate and neural networks and artificial intelligence that's that's all it is is nothing mysterious about it is just
05:53
calculations this can be done at so that's that's ourselves
05:59
a very very simple example just to get started and imagine that the conversion from kilometers to miles is a difficult problem just imagine I know it's not you know it's not list of commodities and imagine we didn't know how to do it so we invent a model in our mind we say maybe 1 is the other
06:19
1 multiplied by number that's a model we can come up with a model we think you might be right you might be wrong you know try and let's start with and the number we don't know whether you
06:30
could be miles is formed times 100 almost km times to list all the number will start with . 5 and if we compare it with with real examples of truth as we know it should be 6 2 . 1 3 7 but our model and calculated 50 it's not that bad and there is an error of 12 to occasional on this
06:56
list we got 1 6 like gives a better and and
07:02
and so still not exactly right but the error is much smaller now and strike again put 7 where we've gone too far and it's much worse now that's that's obvious that enthusiastic about jumping and try . 6 1 and that's actually going quite close so
07:22
this this idea of using a model and tweaking a parameter inside and then comparing the output with with what we know should be true is how neural networks work and many other action machine learning methods we use the error that pops out the other end and use that to kind of tweaking guide the refinement of the parameters of the model I just clear and that's a super easy example but that's what the neural network is doing if you just replace a circle within your network that's really what's happening your training it is looking at the area and your tweaking parameters inside it to
08:01
try and get a better answer and Bingo and that's a and
08:09
again so this a key points there and if we don't have something really works and you know we we we we have not an exact mathematical model we can invent we can we can come over the model that we think might be true but we can try it and we can have promises that we can adjust and important point there is the error is used to refine the model so let's this take our
08:33
daughter the garden where she likes to
08:34
pick up the bugs and and she's picked up some and caterpillars ladybirds and imagine that we've plotted the monograph with width and length so caterpillars are thin and long and ladybirds are short and wide and if you plot we can see this 2 clusters 2 groups there which is interesting and some of you
08:57
will recognize this as clustering but that's that's cool
09:03
what we did 1st with our 1st example was have a linear online a predictor and the relationship between kilometers miles away we thought was a straight line with and we change the parameter we change the slope will trying to do here is to see if
09:19
we can apply the same simple model and see if we can come up with a way of predicting will classifying them what about should be so that line instead of
09:30
being a prediction line it could be a separating line so things on 1 side lines are 1 type of object caterpillars things on the other side of the line might be on the lady words that's not what want because that doesn't separated 2 kinds of this
09:48
1 doesn't either and 1 does so
09:52
you can see that's learning to classify is not that different from the very 1st simple example we looked at and the and you can also see through this kind of naive animation that we're changing the slope is away all of and learning to find a good separation line between the 2 clusters so ones with if we learned that line if we've learned a good separating line if we then
10:20
find an unknown but you can say well that's that full then that's half of the of the space so it must be a caterpillar so
10:31
classifying things is it's contradicting things so we apply these methods and when we don't really know what the model should be be but what we do have is real data so we learned from data we invent a model we think it's a good 1 and we try to refine it and to match the data that we've collected it might be in data from space in the microwave background radiation which we about earlier in the week it might be voice data it might be a sentiment and we're gonna stick
11:06
with the super super simple data dataset consisting of 2 items that um and just the width and length of 2 books here we
11:17
reported them so we start again with them a randomly chosen and parameter for that line a randomly chosen gradients and we say OK that's not some that's not good because it doesn't separated 2 lines so let's look at the 1st example there but we need to shift the lineup 2 point OK with improved is not that kind of thing does a good job no matter what can we learn from the 2nd well we knew we look at the 2nd example we say alright the separator must keep that example on that side of the line and that kind of works and if you're interested in the mass it's really single we've we've got straight lines and it's very so simple linear algebra you can rearrange the tends to work out what the change in Britain should be if you want to and get line to go through a certain point but actually we and a kind of a mistake here is what we've done is we've looked at an example and ignored all the previous ones before if we don't do that we want to learn from all the data are not just the last 1 looks at if we did this we would work through all the examples and we just um have an answer which you could have got by looking at the last example so 1 way of doing that is actually not to be so enthusiastic about the euro and the the amounts that you jump up by what you what you can do is you can say instead of
12:53
jumping forward toward changing the line by line of 5 we apply a factor learning rates so we only jump a little bit safer before the 1st example wants me to go over there I just move in that direction a little bit if the next examples which go over there and we're a little bit so with lots of data you can see
13:13
that she eventually get better and better we're not overly influenced by each individual data points and that's good because the data is noisy and they can be analyzed can be errors in the data you don't want to you know over gives too much importance to any 1 individual data points and I learning rate is and quite
13:31
important some an idea your networks will will understand why in a minute so let's and increase the amygdala the complexity bit to imagine we have dataset which really and has something to do with the real world and its causal so maybe I'm measuring the amount of smiling in this room and the other factors in measuring or whether it's sunny and whether this weekend so you can see if it's sunny and it's the weekend money more smiles or if the sun shining and there's no clouds the tension might go you can see that in the real physical world data can have causal links and we want to build a model that actually not that would be a great thing to do to model and we have to predict so classified data that comes in real world so here's some simple examples here and you know we have on the hooligan relations we have and the and relation relation if 2 conditions are true and the 3rd 1 is that the the the and is only true if the both inputs true and so some data can be like this and can we can we model that with the very simple example the simple classifier that we've got just essentially under that the thing we did at the start of of the picture is nothing wrong with having 2 inputs into a calculation
14:57
but that's OK do you can visualize them and that the data by
15:02
saying really we can plot the sort of you know the the 2 inputs as coordinates and we can see the output is colored so if we have an and relationship recovery green if it's if they're both on and yes the dividing line
15:17
still works we can still have a linear classifier to separate all data which has 7 and and kind of causal link in there and say we'll just call we could use that very simple very candid
15:33
naive and classifier to learn data which has the and uh then of course in it or it was glorious such as school that's that's that's looking hopeful that actually in in
15:47
history and I think it was probably in the seventies and people sort of have became sad because somebody wrote a paper that said actually these simple classifiers a very limited and and is because this simple classifiers linear classifiers that can't land data which has the x or relationships so if I have 2 variables which are related to the answer with an X or the only true for me that of the inputs are true but not both can't do that and you can see why you can see visually no line and correctly separates those 2 classes said so that led to the development and I guess a bit of a slowdown in in research in neural networks and but if you look at this you think well against this kind of there already we have 2 lines at and this is an important point as you know this is a very simple example but what this suggests to us is that actually we need more than 1 of those classifiers to help us with data that's more complex and that is actually 1 neural networks have many many nodes more than just 1 node and that's that's that's the thing that's not an important point so some problems can't be solved we just a simple linear classifier we can know that and but it's the motivation for what you might want to explore using multiple nodes let's take a shift a little bit and look at them and nature again we started right at the start with the example of my daughter's brain being able to find people in a photograph but me not being up to code that very easily so the human brains doing something and working in a way that's different from my kind of in a laptop here wants to work and it's you know right history people tried to understand what is it
17:53
about the way biological brains work that makes him so good what can we learn from natural replicated in new kinds of algorithms and actually you just just have a look at you you know this computers got what is it 16 data Ramond how many of them make give kind of instructions per 2nd it's it's quite an quot chunky and yet page in with which is the brain of point 4 0 grams and it can fly you can
18:22
learn to eat it can communicate and the it can learn to do new tasks that's really important and a snails by 11 thousand neurons that's not really you know we we can store sort of you know with big data and all these in a huge amounts of and data structures and these things have just 11 thousand this and this work has 302 neurons in uncharted for that in a in a microgrid and the fact that and this is interesting and there is a species of helpful of whale and which has 37 billion neurons but we humans of 20 and it's using them because if it
19:06
wasn't using them it would have evolved away because the cost missions think that maybe we're not the most um as superior things of plants but anyway the point here is that term in nature is doing something with with brains that we can learn from and you know which with apparently such small resources they're able to do tasks which we think of quite complicated so those neurons that that the
19:36
biologist and no or inside our brains and nervous systems 3 look at them what they do is they kind of transmit a signal along and on to another 1 and there's adjusted to conclude the names for the various elements but what they
19:52
don't do is there and sort of pass a signal on and then kind of you know without any kind of resistance what they do is they only pass a signal on once the signal is kind of passed threshold recite turning up a dialog on the light goes on after I reached a certain kind of number and so maybe our computing neurons that we model maybe they should do the same and some people think what could use a step function to do that so if the input is past a certain point and its which is on and actually you have you could do that um but in nature we know that things aren't always sort of black and white and hardedged things softer and so we we might try it has softer and kind of function that we the sigmoid function and there are others that you could use and we know the nature of these things are connected like sort of a network a mesh and signals going along so maybe that's what we should try and model when we want to do some interesting and test recognizing pictures and again going back to the thing we saw writers start there's nothing wrong with having more than 1 inputs coming into computing kind of node and and what we've just said here is that we're collecting the inputs just as they do in in the in the in nature and begin to apply a threshold function so that we only have magical effect combination is big enough and that becomes our node in a neural network so after and the minutes of talking omega running out time and reflect the of that's all it is a network in neural networks and artificial neural network is our attempt to try and recreate what this biological brains doing and each of those circles is doing what we saw here collecting the signals by applying a threshold function and passing the output and
22:01
it is convention that we call these layers we have the middle layer given input where have not put layer
22:06
and then there are some connections so let's
22:13
pause a little bit and think without very very 1st example where we wanted to convert 2 miles to kilometers we had a straight line with an adjustable slope of parameter that's what did the learning the learning was the changing of that slope that that kind of multiplication factor what's learning in a neural network what do we change what do we need to tweak so that the outputs of better there is probably lots of answers to that you might say that function that is threshold function that of maybe we need to change the slope SlipAlert in each of those nodes that's that's probably innocent nor bad idea and actually what we're history is taken as what people do is adjust the links the strength of the links between those some nodes so if if a link is strong from a signal is amplified if a link is weak it is can reduced and if a link is 0 effective if the weight of the strength is 0 the effects of the break break link so that's that's 1 approach and that's will not become popular for because
23:21
it's easier as well so when we
23:28
feed signals forward so let's just imagine what signal 1 of the top there and we have a link there go called between 1 and 1 and you can see it's also a white the strength of point 9 what we would do is take the signal 1 multiply the . 9 and that's what freezing the next node same . 5 times point 3 is is what would go there that's really easy that's that's not complicated at all and that's that's what is happening inside a neural network just multiple multiplying signals through connections and freezing them on to the next node in collecting them the that's
24:08
just a reminder overdoing recall the signals coming in we add them up but this time
24:13
you can see that we're waiting them reusing the weights of those links to kind of either boost or reduce the signals of color them their sequences you can see the size of Trajan after the calculation of the year you can see if you want to you can verify that is that times that sentence and that would give you that answer and that's really is a simple as as as gets and that's on your network is doing nothing
24:41
nothing very complicated marital so we had a very
24:48
simple network here we just and therefore nodes if we rotate with a pen and paper what is happening at each node so at that node number 1 and layer 2 if we rotate what's actually
25:02
happening with say it's the input 1 times that weights plus input 2 times that weight if you registered for this 1 and we restaurant again for all the nodes you start
25:12
to see a pattern and that pattern is really helpful because that allows us to write that calculation as a matrix multiplication so the weights matrix times input signals becomes the um signals
25:29
are going to the next term layer and that's really really valuable to us because of 2 reasons it allows us to
25:40
write that calculation in a much more concise way so we don't have to write pages and pages for big networks which is right wait times input is the the signal into next where the other reason that's really important is because computers number 1 can accelerate matrix multiplications and we want to take advantage of that whether it's numb find whether it's fortran libraries we had about earlier whether it's hardware acceleration so using your graphics card to multiply matrices and if we can formulate our calculations in terms of major
26:16
cities then we can take advantage of the acceleration as possible so you might say all my matrix is so boring why would you have to do this again but this is the reason I would say so that's cool we're kind of feeding a signal for forward to each layer of the network and we get an answer at the other end we know that's actually we're likely to be wrong just like we were at the start so we have an error and going back to the very 1st example again we use etc. 2
26:52
refine and improve the and parameters inside the model how do we do that here so let's break it this
27:00
and that's kind of have a simple network as simple picture just to see what might happen well we know
27:06
that we need to change the weights that we've we've already agree that we can change those on the strength of those links in order to try and improve the answer that's what playing with that's the parameters of a tuning and we know what's the what's the error um we know the is right at the end of the network if the answer should be fired and we get 3 there was to what's here inside the network is when you need to know the area in order to change the the weights so that's that's quite an interesting question and actually lots of of the guys in the books from above so that a little bit what we could do is watch the 1st thing to say that this probably know kind of and mathematically perfect answer so what we we do is we can
27:54
think the word is heuristics we think well what will will be a intuitive way of at the air inside the network and 1 intuitive idea is to say let's spaces errors 5 maybe I push 2 and a half this into enough that way that's an idea and another idea is to say that toppling 3 where the weights 3 contributed more to the area because it is a bigger stronger link it magnified signal maybe I should put more error in that direction so you splits the error proportional to the links so if if our government
28:37
weights of 3 and 1 day the links of string 3 and 3 1 1 you can see 3 quarters 0 would go to the top node and a quarter of go this way or that kind of makes sense and I'm sure there's more sophisticated things you can do that we want to keep it simple especially if it works so you can see there actually that some the error from that node and is is being kind of Split and pushed back and the same here and the internal nodes you actually collect the several fractions of error that he that link to it sounds complicated but when you see it as a picture you can see the errors flowing backwards back propagation of error that's where the term comes from error backpropagation exists feeding signals and backpropagating errors of to the calculations you can begin afterward at the point here is that your summing up the error so if the errors . 6 from that top right and this was comforting . 1 there is an . 7 supported 6 plus point 1 just collected the areas and again it's really nice really fortunate that if we did write out what was really happening in in terms of the variables and we becomes a matrix multiplication again which is really nice because we can accelerate that and we can write it in a very concise way without worrying about the actual size of the network the only slightly different because a weights matrix
30:11
is then transpose and diagonally against super super
30:16
simple OK so we've got the
30:22
errors now at each node How do we have a adoration added we change the weights OK and so that's the output at 1 of the output nodes those WC are the weights all those links inside and I'm not revealed to untangle that if you can you know well and that's horrible so what we need to do here is to say we're not going to build to kind of entangle that in any kind of nice mathematically clean way let's find other mathematical methods which are perhaps approximate but good enough so that's got a bit of a journey
31:04
imagine this landscape is a complicated function like the 1 we saw and if it's a narrow function and because that's what we had here is the output and the error function is
31:18
simply that minus for the of the
31:21
actual target should be and if this horrible complicated lumpy landscape is is a very deep complicated function which we can't
31:32
work out analytically and it would nice clean algebra another way to kind of work with it and maybe work at the minimum minimum areas is to sort of say well if this was a landscape and I don't
31:47
have a map of everything and that it was dark idea couldn't understand the whole function but I did have a torch what could do 2nd point the torch uh down in my feet and say well with its slope is going in this direction take a few
32:02
steps it's going in that direction taking the steps and eventually and you would work your way down to a minimum and some of you will we hands up and say it might not be the best minimum will come to the so this this approach which is not mathematically kind of clean and it's an approximate method but it works really well and and you can see it working really well with a little less pretend x squared function is really
32:30
difficult masses pretend that you can say that you know we started a point and we see where the gradient is locally and we can move in that direction and we keep doing that you get to the minimum um which which works must come nice and you might even be more sophisticated and say as this loop get smaller and you might take smaller steps because you're getting closer and closer to the real minimum you don't have overstep its that's an idea that's actually used in neural networks as well see that if that Arafat complicated function um was the error function then we
33:09
have a way finding that you have a picture to show you there is
33:16
so if we have the weights which is what we want to um kind of improve and we have an error function which is complicated we want to use this gradient descent method to find the minimum all that error and will then know what the right way to be safer over here with the wrong way begin at higher and begin to try and say OK I want to improve my position and move down the error function to somewhere where there is smaller and then the weights that will tell you what right weights nonnegligible this goal gradient descent and it's a way of working with that horrible kind of an expression that we could
33:58
kind of to analytically before he she did write descent again with
34:04
pen and paper and and and worked out of the gradient locally it's not that hard amino induces an abrupt motions want to look at very simple calculus is the contact was you do at school just using the chain rule and nothing more complicated than that so if you are interested in having look this is what I hope is a very clear kind of blog post on that so we've what we're doing now is we've we've been we've worked out a way of improving the weights based on the gradient of the function and here you see this kind of pressure many times where we iterate you keep improving yeah OK i've only
34:46
got of a bit of time left i'm gonna zoom through and so how you might do yourself and I'm not an expert Python code from this people here law but broadly speaking you know if you wanted to do this
35:00
yourself you might think what would a Python kind of programmer class look like well with we know we've got to initialize this data structure this network that is really simple you know all we to really do is set the size
35:14
and initialize these weights to random initial values and we know that we're going to have some way of training the network so we doing the learning and we've got all have a away method of querying the network so we ask it questions and get an answer back and it's you know you if you go want to make your own and a new network library or class does nothing more complicated than this at all and I had to get enough
35:42
to so frontal land and and this is very useful kind of Python libraries number is great for matrix by multiplications and as you always reached at that site because
35:54
that's a nice some of functions in there for doing by that that curve to square that curve of graph the official function its call have built in so you can we can use that yourself and cutting things is that problem can use the things and I started programming a long time ago and then I stopped so I was
36:13
coding Python in 1999 i think 98 Python 1 . 5 0 . 6 Numeric library rather than number anyway no boosting exist there so I came back to Python and members of fantastic excellent so this this is the sum an example of
36:31
function which initializes the network it looks complicated and don't be put off all it's really doing is setting the size in terms of input nodes and nodes the output nodes and consistency here and using a number by function to randomize and the the weights which are a matrix that's it nothing more complicated in that the suicide by function there well expert always for expert that's the logistic function the curve will do the training that's occurring again really easy we take the inputs um that's of tournament in and it was a list here we can see that there is a matrix multiplication employed or not to do that the the calculation and that's it we apply the activation function to the outputs that modifications simple as that you then go the signal at the next layer how to do it again that's
37:29
it it's the simplest form propagating the signal through a neural
37:34
like as simple as that I'm sure I can make it here even more concise but um I just want to let you know really by doing this that there is not mysterious all scarier complicated really is as simple as just that and the training again is wall look scary but it isn't really the top half is exactly the same as what we just had graffiti signals forward in exactly the same code as before and then what we're doing is we're saying the Hanford errors is a target which from a training data minus what we've what worked out and then we use another set of matrix multiplications to work at the the errors in turn leads the network and then we change the weights
38:21
using that expression that we worked out with calculus that's it that's how you train work I'm sure if make even more beautiful and clean but I just wanted you to kind of get the feel for what's really going on in the network and it's not that complicated itself OK for the few minutes left I'm
38:38
just going sort of show you that with just very simple ideas that we looked at I mean you know people will have many more kind of you know sophisticated methods and optimizations and you can read quite a lot about neural networks with just with a very very simple ideas that we've looked at you can do so powerful things so we can train a network to have learned to recognize human handwritten numbers is a famous challenger dataset quality in the states that and it's got 60 thousand um training examples and it's all Free Open Data you can get yourself from the you to them up broke you you can point you to it and and there's some of the test set as well you can compare your results with others if I looked at the data you'll see the numbers they're actually the profits image using the what is not problem because you that's that's a 5 28 by 28 pixels so if we feed those found that data into a network and training actually I'm I'm missed something that we have to choose what the output looks like and what i've chosen is to say we have 10 nodes at the output and and if the answer should be say 9 the Ninth 1 has the biggest value so you can see if the if if it's 5 that is the 5th node that should be of a high value on having after below
40:06
that's what I'm using to train the network that last example is interesting because in this 1 the network thinks the answer probably 9 but it might also be for and
40:19
then you know you you you can get some really good results just with those simple ideas 96 % accuracy is what our government does go that's not bad you know so 20 lines of code and steered handwritten human in their numbers handing it over 90 % accuracy but that's that's that's not bad at all is that I think should apply you can begin to things like the learning rate of the number of hidden nodes and you can see that you can get improved performance then it might not be so much have deliberately put in and the middle 1 keep graphs there just to remind us that neural network training is a random process with starting off with random initial weights and sometimes you can go wrong and that's for the scientists among this
41:08
year reminds us that we should do this many many times and take the best of all water make sure that we not put enough and almost kind of answer remember that great descent before you might end up with the wrong minimum of the best minimum this western this many times if you rotate your arm and the original text that you
41:30
can get my tapes and I think this is actually the good because if if you look at the academic papers they get sort of you know 99 % 99 . 5 % the using really advanced techniques and I think you know it just 21 circuit that's not bad I'll just
41:48
skip this somewhat in areas how to say you can should do this the rest reprise their everything of Don and you can look at the blog well you can do with that res reprise 0 which costs about 4 or 5 euros and you don't need an NVIDIA graphics got to the end of this so the last few minutes on you know try and do alive
42:06
demos and I might said that that
42:09
might regression so this think of a number to classify this that some things a number 7 against 1 digit had 7 this right here His 1 I didn't hear actually
42:24
from a newspapers that's number 2 and a guy that right this is a network of trains last nite so that's
42:29
a good thing OK for other 3
42:37
if it doesn't work is a marvel at at at at OK let's resize that's to 28 by 28 let's say that PNG holders couldn't get of you want to explore alright if Cyprus
43:03
go that's 3 no ancestry if you look at the fugitive
43:16
you last nite and it didn't work overtime and a few and I'll stop there and and I'd love to chat about this afterwards them at anatomical the 4 questions to a to use it all
43:27
up thank you for giving them so as to why we went in few
43:40
so questions the hi and are
43:57
so we have seen the all the nodes of the numbers 0 what are you put nodes are goes the individual pixels those are yeah those are the individual pixel
44:07
so if we have an image of 28 by 28 . 784 pixel the think so you have an input layer of 784 you can choose other ideas you can say I want to rescale everything or I might want to have different features as inputs you can have you can do things like that and that this is a very simple example very naive example which takes the raw pixels and that it works but you use people will do other things to perhaps if they know something about the data working with they might say I think another feature is more likely to be a factor in the answer and much to use that to train the network instead so might use color they might use alpha values they might use something else
44:55
so any other questions OK thanks a lot at the time