A Gentle Introduction to Neural Networks (with Python)

Video thumbnail (Frame 0) Video thumbnail (Frame 1633) Video thumbnail (Frame 2829) Video thumbnail (Frame 3765) Video thumbnail (Frame 4784) Video thumbnail (Frame 6047) Video thumbnail (Frame 7015) Video thumbnail (Frame 7953) Video thumbnail (Frame 8911) Video thumbnail (Frame 10347) Video thumbnail (Frame 11994) Video thumbnail (Frame 13392) Video thumbnail (Frame 14663) Video thumbnail (Frame 15699) Video thumbnail (Frame 16612) Video thumbnail (Frame 19324) Video thumbnail (Frame 20248) Video thumbnail (Frame 22415) Video thumbnail (Frame 23300) Video thumbnail (Frame 26800) Video thumbnail (Frame 28628) Video thumbnail (Frame 29788) Video thumbnail (Frame 32985) Video thumbnail (Frame 34968) Video thumbnail (Frame 36113) Video thumbnail (Frame 36995) Video thumbnail (Frame 38208) Video thumbnail (Frame 39372) Video thumbnail (Frame 40302) Video thumbnail (Frame 41815) Video thumbnail (Frame 42855) Video thumbnail (Frame 45250) Video thumbnail (Frame 46555) Video thumbnail (Frame 47665) Video thumbnail (Frame 48719) Video thumbnail (Frame 49702) Video thumbnail (Frame 50952) Video thumbnail (Frame 52085) Video thumbnail (Frame 53478) Video thumbnail (Frame 54771) Video thumbnail (Frame 56192) Video thumbnail (Frame 57511) Video thumbnail (Frame 60140) Video thumbnail (Frame 61675) Video thumbnail (Frame 62656) Video thumbnail (Frame 63581) Video thumbnail (Frame 64506) Video thumbnail (Frame 65368) Video thumbnail (Frame 67368)
Video in TIB AV-Portal: A Gentle Introduction to Neural Networks (with Python)

Formal Metadata

A Gentle Introduction to Neural Networks (with Python)
Title of Series
Part Number
Number of Parts
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Release Date

Content Metadata

Subject Area
Tariq Rashid - A Gentle Introduction to Neural Networks (with Python) A gentle introduction to neural networks, and making your own with Python. This session is deliberately designed to be accessible to everyone, including anyone with no expertise in mathematics, computer science or Python. From this session you will have an intuitive understanding of what neural networks are and how they work. If you are more technically capable, you will see how you could make your own with Python and numpy. ----- Part 1 - Ideas: - the search for AI, hard problems for computers easy fro humans - learning from examples (simple classifier) - biologically inspired neurons and networks - training a neural network - the back propagation breakthrough - matrix ways of working (good for computers) Part 2 - Python: - Python is easy, and everywhere - Python notebooks - the MNIST data set - a very simple neural network class - focus on concise and efficient matrix calculations with bumpy - 97.5% accuracy recognising handwritten numbers - with just a few lines of code!
Goodness of fit Open source Artificial neural network Term (mathematics) Self-organization Data conversion Open set Rule of inference
Computer animation Artificial neural network Code Multiplication sign Expert system Right angle Computational intelligence
Point (geometry) Computer animation Artificial neural network Moment (mathematics) Bit
Pointer (computer programming) Computer animation Demo (music) Artificial neural network Maxima and minima Number Library (computing)
Point (geometry) Digital photography Computer animation Set (mathematics) Right angle Frame problem Number
Digital photography Digital photography Word Computer animation Artificial neural network Sampling (statistics) Right angle Audio file format Quicksort Computational intelligence
Googol Computer animation Lecture/Conference Artificial neural network Moment (mathematics) Artificial neural network Process (computing) Port scanner Mereology Game theory Value-added network Hypercube
Programmer (hardware) Word Computer animation Artificial neural network Calculation output Computational intelligence
Data model Computer animation Multiplication sign Real number Electronic mailing list Parameter (computer programming) Data conversion Endliche Modelltheorie Error message Number
Area Group action Computer animation Artificial neural network Electronic mailing list Virtual machine Electronic program guide Circle Endliche Modelltheorie Parameter (computer programming) Error message Wave packet
Point (geometry) Data model Computer animation Lecture/Conference Point (geometry) Knot Key (cryptography) Endliche Modelltheorie Error message Mathematical model
Group action Coccinellidae Clique-width Computer animation Length Gene cluster Continuous track Parameter (computer programming)
Predictability Type theory Software bug Word Goodness of fit Computer animation Gene cluster Continuous track Line (geometry) Object (grammar) Flow separation
Software bug Continuous track Computer animation Continuous track Endliche Modelltheorie Spacetime
Point (geometry) Clique-width Inheritance (object-oriented programming) Length Gradient Parameter (computer programming) Mass Line (geometry) Parameter (computer programming) Linear algebra Flow separation Mathematics Process (computing) Computer animation
Point (geometry) Goodness of fit Bit rate Divisor Lecture/Conference Direction (geometry) Bit Line (geometry) Error message
Theory of relativity Link (knot theory) Divisor Artificial neural network Special unitary group Bit Port scanner Computational complexity theory Computer animation Calculation output Point cloud Endliche Modelltheorie Condition number
Computer animation Causality Link (knot theory) Lecture/Conference Green's function Data recovery output Function (mathematics) Quicksort Line (geometry)
Complexity class Point (geometry) Laptop Multiplication Shift operator Algorithm Dataflow Artificial neural network Software developer Bit Line (geometry) Variable (mathematics) Computational intelligence Digital photography Computer animation Natural number Linearization output Right angle Key (cryptography) Quicksort
Point (geometry) Computer animation Term (mathematics) Natural number Online help Data structure Species Quicksort Port scanner Task (computing)
Point (geometry) Functional (mathematics) Polygon mesh Artificial neural network Multiplication sign Artificial neural network Combinational logic Sound effect Computational intelligence Thresholding (image processing) Element (mathematics) Number Sigmoid function Message passing Computer animation Natural number Function (mathematics) output Circle Software testing Quicksort output Physical system
Functional (mathematics) Link (knot theory) Artificial neural network Weight Sound effect Control flow Line (geometry) Function (mathematics) Parameter (computer programming) Density of states Thresholding (image processing) Connected space Computer animation output Coefficient
Point (geometry) Computer animation Link (knot theory) Artificial neural network Multiplication sign Freezing Connected space
Computer animation Link (knot theory) Artificial neural network Function (mathematics) Multiplication sign Calculation Weight Summierbarkeit Sequence Graph coloring
Multiplication Computer animation Artificial neural network Multiplication sign Weight Calculation Matrix (mathematics) output Pattern language Matrix (mathematics) Number
Web page Graphics processing unit Multiplication Artificial neural network Multiplication sign Computational intelligence Number Term (mathematics) Calculation Computer hardware Matrix (mathematics) output Right angle Writing Library (computing)
Computer animation Lecture/Conference Artificial neural network Function (mathematics) Artificial neural network Matrix (mathematics) Control flow Endliche Modelltheorie Parameter (computer programming) Error message Matrix (mathematics)
Area Computer animation Link (knot theory) Artificial neural network Function (mathematics) Weight Order (biology) Artificial neural network Bit Parameter (computer programming) Error message Tunis
Area Point (geometry) Multiplication Link (knot theory) Artificial neural network Direction (geometry) Weight Variable (mathematics) Backpropagation-Algorithmus Flow separation Fraction (mathematics) Word Computer animation Propagation of uncertainty Term (mathematics) Calculation String (computer science) Matrix (mathematics) Heuristic Right angle Error message Matrix (mathematics) Spacetime
Quantum entanglement Computer animation Link (knot theory) Function (mathematics) Weight Point (geometry) Bit Key (cryptography) Function (mathematics) Error message Matrix (mathematics)
Area Metropolitan area network Functional (mathematics) Computer animation Lecture/Conference Gradient Maxima and minima Function (mathematics) Gauß-Fehlerintegral Linear algebra Quicksort
Point (geometry) Software bug Functional (mathematics) Computer animation Mapping Direction (geometry) Maxima and minima Arm
Point (geometry) Functional (mathematics) Artificial neural network Gradient Direction (geometry) Point (geometry) Gradient Maxima and minima Mass Loop (music) Computer animation Key (cryptography) Gauß-Fehlerintegral
Computer animation Lecture/Conference Weight Gradient Expression Artificial neural network Maxima and minima Gauß-Fehlerintegral Error message Gradient descent
Functional (mathematics) Code Gradient Weight Multiplication sign Gradient Physical law Expert system Bit Bit rate Calculus Complexity class Computer animation Blog Pressure Chain rule Router (computing)
Metropolitan area network Programmer (hardware) Computer animation Artificial neural network Weight Data structure Initial value problem Complexity class Router (computing) Social class Library (computing) Mach's principle
Curve Multiplication Functional (mathematics) Graph (mathematics) Numeral (linguistics) Computer animation Multiplication sign Matrix (mathematics) Summierbarkeit System call Library (computing) Number
Curve Functional (mathematics) Multiplication Artificial neural network Mountain pass Consistency Weight Tournament (medieval) Electronic mailing list Expert system Function (mathematics) Wave packet Number Computer animation Term (mathematics) Function (mathematics) Calculation Matrix (mathematics) output output Form (programming)
Multiplication Code Artificial neural network Mountain pass Weight Expression Set (mathematics) Calculus Wave packet Arm Wave packet Computer animation Function (mathematics) Matrix (mathematics) output Error message
Metropolitan area network Pixel Artificial neural network State of matter Function (mathematics) Open set Discrete element method Number Wave packet Medical imaging Computer animation Testdaten Function (mathematics) Software testing Quicksort Freeware Mathematical optimization Resultant
Stochastic process Metropolitan area network Arm Artificial neural network Code Weight Multiplication sign Graph (mathematics) Water vapor Maxima and minima Line (geometry) Discrete element method Wave packet Number Mach's principle Computer animation Bit rate Resultant Gradient descent
Area Digital electronics Computer animation Blog Tape drive Quicksort
Computer animation Demo (music) Artificial neural network Digitizing Computer-generated imagery Personal area network Right angle Number Wave packet
Metropolitan area network Computer animation Lecture/Conference Computer-generated imagery Personal area network
Medical imaging Pixel Computer animation Divisor Artificial neural network Different (Kate Ryan album) Computer-generated imagery output Graph coloring Number Alpha (investment)
Multiplication sign
good morning everyone and so this is the last year Python and in this session we
are actually dealing with some very hot topics uh that I'm really interested in them that's neural networks and deep learning and to begin with we have our 1st speaker dealership uh he is given a lock on a gentle introduction to neural networks so please give a kind of a plus few
the rule can you hear me this is what the you
know who might as well thank you very much for and coming to my talk and for me it's really great to be term open community open-source conference I always learn a lot actually I'm always of great conversations and is always a very generous spirit so on thank everyone and also the organizers so nice 1 and and you talk about some neural networks and I just want to be really clear and my talk is very and tends to be very introductory itself and this so people who perhaps don't know what neural
networks are or how they work well maybe you study them along time ago and forgotten so that if you already know what they always uridine experts you might be bored so don't mind if
you if you want to go to another talk co at at and my name surrogacy of let's say less and I am 1 of the co-organizers of London Python and and if you want to come and do something with this and please come along and have a chat with me we really wanted you more so broad things everything from computer art to and
teaching people to code right yeah yeah and cool so on as is there this is an
introductory talk and we'll talk a little bit about the background of what is artificial intelligence why there lot interest in your notes moment and then will get into
the ideas and that's the the me to this talk really it's what are the concepts that are used and in neural networks and what all you do is all all you some very simple common
audio examples which which may not seem that interesting but they illustrate the very key points that make you know its work so I hope that you stick with them
and that will help us understand what's going on inside a neural network and
will also apply them and I'll give you an example of applying your networks to quite interesting and and challenge recognizing handwritten numbers and I'll give some pointers on how you might code and and I might regret this kind of a live demo at the end and and if it goes wrong
that's going to be really embarrassing but but but give it a go and I'm not going talk about and libraries and there's lots of cool stuff out there there's the I there's tons of road and and this double to talks today and covering things like that someone is really about the concepts what's really going on and how you might do right so
just to get us into and the con the right frame of mind and from stalwart 2 questions so I have a
7 year old daughter and I and she likes challenges so a set of this challenges said can you look at this picture and point out where the people up and and as a 7 year old child she found that quite exciting very easy they like 7 year children and she can't that that that the people in the picture and that was that's fine and she's she can do numbers you can actually subtract so so said can you add
those numbers and then she
found that very difficult but you know and that with coding with computers with Python and doing the calculations such as the 1 on the right is actually very easy but in structuring the computer to find people in the photo is not so easy so that's
interesting that's easy for us and that's easy to code for computers but that's hard to get computers and
that's easy so there's something there and we would like to build to solve these kinds of problems interesting now find me a picture of a cat and work out what this sound sample what the words are in this sort of audio file you know that those are really interesting problems and we want to more of them and you know the terminology like artificial intelligence means different things to different people so for me it means being up to solve the kinds of problems that traditionally have not been found
that straightforward so that's what that's what this is about and or hyper moments because there's lots
of them stuff going on that you know you want to miss this autonomous cars this health data being used to improve on and outcomes and Google's been very kind of active recently with and they're being able to play
go which which is amazing we thought that it would take another so 20 years and they used in neural networks as part of their solution so that that in the interest people and
and that's what we'll talk about it so
let's go right back to the beginning in a really really assuming the so we want to ask the computer a question and we want an answer and it's produce some kind
of thinking but clearly can't think it's a it's just in a metal and wires and so has to
calculate and that's the has to process and those words I guess programmers like ourselves understand we have inputs we have some kind of calculation and we have adequate and neural networks and artificial intelligence that's that's all it is is nothing mysterious about it is just
calculations this can be done at so that's that's ourselves
a very very simple example just to get started and imagine that the conversion from kilometers to miles is a difficult problem just imagine I know it's not you know it's not list of commodities and imagine we didn't know how to do it so we invent a model in our mind we say maybe 1 is the other
1 multiplied by number that's a model we can come up with a model we think you might be right you might be wrong you know try and let's start with and the number we don't know whether you
could be miles is formed times 100 almost km times to list all the number will start with . 5 and if we compare it with with real examples of truth as we know it should be 6 2 . 1 3 7 but our model and calculated 50 it's not that bad and there is an error of 12 to occasional on this
list we got 1 6 like gives a better and and
and so still not exactly right but the error is much smaller now and strike again put 7 where we've gone too far and it's much worse now that's that's obvious that enthusiastic about jumping and try . 6 1 and that's actually going quite close so
this this idea of using a model and tweaking a parameter inside and then comparing the output with with what we know should be true is how neural networks work and many other action machine learning methods we use the error that pops out the other end and use that to kind of tweaking guide the refinement of the parameters of the model I just clear and that's a super easy example but that's what the neural network is doing if you just replace a circle within your network that's really what's happening your training it is looking at the area and your tweaking parameters inside it to
try and get a better answer and Bingo and that's a and
again so this a key points there and if we don't have something really works and you know we we we we have not an exact mathematical model we can invent we can we can come over the model that we think might be true but we can try it and we can have promises that we can adjust and important point there is the error is used to refine the model so let's this take our
daughter the garden where she likes to
pick up the bugs and and she's picked up some and caterpillars ladybirds and imagine that we've plotted the monograph with width and length so caterpillars are thin and long and ladybirds are short and wide and if you plot we can see this 2 clusters 2 groups there which is interesting and some of you
will recognize this as clustering but that's that's cool
what we did 1st with our 1st example was have a linear online a predictor and the relationship between kilometers miles away we thought was a straight line with and we change the parameter we change the slope will trying to do here is to see if
we can apply the same simple model and see if we can come up with a way of predicting will classifying them what about should be so that line instead of
being a prediction line it could be a separating line so things on 1 side lines are 1 type of object caterpillars things on the other side of the line might be on the lady words that's not what want because that doesn't separated 2 kinds of this
1 doesn't either and 1 does so
you can see that's learning to classify is not that different from the very 1st simple example we looked at and the and you can also see through this kind of naive animation that we're changing the slope is away all of and learning to find a good separation line between the 2 clusters so ones with if we learned that line if we've learned a good separating line if we then
find an unknown but you can say well that's that full then that's half of the of the space so it must be a caterpillar so
classifying things is it's contradicting things so we apply these methods and when we don't really know what the model should be be but what we do have is real data so we learned from data we invent a model we think it's a good 1 and we try to refine it and to match the data that we've collected it might be in data from space in the microwave background radiation which we about earlier in the week it might be voice data it might be a sentiment and we're gonna stick
with the super super simple data dataset consisting of 2 items that um and just the width and length of 2 books here we
reported them so we start again with them a randomly chosen and parameter for that line a randomly chosen gradients and we say OK that's not some that's not good because it doesn't separated 2 lines so let's look at the 1st example there but we need to shift the lineup 2 point OK with improved is not that kind of thing does a good job no matter what can we learn from the 2nd well we knew we look at the 2nd example we say alright the separator must keep that example on that side of the line and that kind of works and if you're interested in the mass it's really single we've we've got straight lines and it's very so simple linear algebra you can rearrange the tends to work out what the change in Britain should be if you want to and get line to go through a certain point but actually we and a kind of a mistake here is what we've done is we've looked at an example and ignored all the previous ones before if we don't do that we want to learn from all the data are not just the last 1 looks at if we did this we would work through all the examples and we just um have an answer which you could have got by looking at the last example so 1 way of doing that is actually not to be so enthusiastic about the euro and the the amounts that you jump up by what you what you can do is you can say instead of
jumping forward toward changing the line by line of 5 we apply a factor learning rates so we only jump a little bit safer before the 1st example wants me to go over there I just move in that direction a little bit if the next examples which go over there and we're a little bit so with lots of data you can see
that she eventually get better and better we're not overly influenced by each individual data points and that's good because the data is noisy and they can be analyzed can be errors in the data you don't want to you know over gives too much importance to any 1 individual data points and I learning rate is and quite
important some an idea your networks will will understand why in a minute so let's and increase the amygdala the complexity bit to imagine we have dataset which really and has something to do with the real world and its causal so maybe I'm measuring the amount of smiling in this room and the other factors in measuring or whether it's sunny and whether this weekend so you can see if it's sunny and it's the weekend money more smiles or if the sun shining and there's no clouds the tension might go you can see that in the real physical world data can have causal links and we want to build a model that actually not that would be a great thing to do to model and we have to predict so classified data that comes in real world so here's some simple examples here and you know we have on the hooligan relations we have and the and relation relation if 2 conditions are true and the 3rd 1 is that the the the and is only true if the both inputs true and so some data can be like this and can we can we model that with the very simple example the simple classifier that we've got just essentially under that the thing we did at the start of of the picture is nothing wrong with having 2 inputs into a calculation
but that's OK do you can visualize them and that the data by
saying really we can plot the sort of you know the the 2 inputs as coordinates and we can see the output is colored so if we have an and relationship recovery green if it's if they're both on and yes the dividing line
still works we can still have a linear classifier to separate all data which has 7 and and kind of causal link in there and say we'll just call we could use that very simple very candid
naive and classifier to learn data which has the and uh then of course in it or it was glorious such as school that's that's that's looking hopeful that actually in in
history and I think it was probably in the seventies and people sort of have became sad because somebody wrote a paper that said actually these simple classifiers a very limited and and is because this simple classifiers linear classifiers that can't land data which has the x or relationships so if I have 2 variables which are related to the answer with an X or the only true for me that of the inputs are true but not both can't do that and you can see why you can see visually no line and correctly separates those 2 classes said so that led to the development and I guess a bit of a slowdown in in research in neural networks and but if you look at this you think well against this kind of there already we have 2 lines at and this is an important point as you know this is a very simple example but what this suggests to us is that actually we need more than 1 of those classifiers to help us with data that's more complex and that is actually 1 neural networks have many many nodes more than just 1 node and that's that's that's the thing that's not an important point so some problems can't be solved we just a simple linear classifier we can know that and but it's the motivation for what you might want to explore using multiple nodes let's take a shift a little bit and look at them and nature again we started right at the start with the example of my daughter's brain being able to find people in a photograph but me not being up to code that very easily so the human brains doing something and working in a way that's different from my kind of in a laptop here wants to work and it's you know right history people tried to understand what is it
about the way biological brains work that makes him so good what can we learn from natural replicated in new kinds of algorithms and actually you just just have a look at you you know this computers got what is it 16 data Ramond how many of them make give kind of instructions per 2nd it's it's quite an quot chunky and yet page in with which is the brain of point 4 0 grams and it can fly you can
learn to eat it can communicate and the it can learn to do new tasks that's really important and a snails by 11 thousand neurons that's not really you know we we can store sort of you know with big data and all these in a huge amounts of and data structures and these things have just 11 thousand this and this work has 302 neurons in uncharted for that in a in a microgrid and the fact that and this is interesting and there is a species of helpful of whale and which has 37 billion neurons but we humans of 20 and it's using them because if it
wasn't using them it would have evolved away because the cost missions think that maybe we're not the most um as superior things of plants but anyway the point here is that term in nature is doing something with with brains that we can learn from and you know which with apparently such small resources they're able to do tasks which we think of quite complicated so those neurons that that the
biologist and no or inside our brains and nervous systems 3 look at them what they do is they kind of transmit a signal along and on to another 1 and there's adjusted to conclude the names for the various elements but what they
don't do is there and sort of pass a signal on and then kind of you know without any kind of resistance what they do is they only pass a signal on once the signal is kind of passed threshold recite turning up a dialog on the light goes on after I reached a certain kind of number and so maybe our computing neurons that we model maybe they should do the same and some people think what could use a step function to do that so if the input is past a certain point and its which is on and actually you have you could do that um but in nature we know that things aren't always sort of black and white and hard-edged things softer and so we we might try it has softer and kind of function that we the sigmoid function and there are others that you could use and we know the nature of these things are connected like sort of a network a mesh and signals going along so maybe that's what we should try and model when we want to do some interesting and test recognizing pictures and again going back to the thing we saw writers start there's nothing wrong with having more than 1 inputs coming into computing kind of node and and what we've just said here is that we're collecting the inputs just as they do in in the in the in nature and begin to apply a threshold function so that we only have magical effect combination is big enough and that becomes our node in a neural network so after and the minutes of talking omega running out time and reflect the of that's all it is a network in neural networks and artificial neural network is our attempt to try and recreate what this biological brains doing and each of those circles is doing what we saw here collecting the signals by applying a threshold function and passing the output and
it is convention that we call these layers we have the middle layer given input where have not put layer
and then there are some connections so let's
pause a little bit and think without very very 1st example where we wanted to convert 2 miles to kilometers we had a straight line with an adjustable slope of parameter that's what did the learning the learning was the changing of that slope that that kind of multiplication factor what's learning in a neural network what do we change what do we need to tweak so that the outputs of better there is probably lots of answers to that you might say that function that is threshold function that of maybe we need to change the slope SlipAlert in each of those nodes that's that's probably innocent nor bad idea and actually what we're history is taken as what people do is adjust the links the strength of the links between those some nodes so if if a link is strong from a signal is amplified if a link is weak it is can reduced and if a link is 0 effective if the weight of the strength is 0 the effects of the break break link so that's that's 1 approach and that's will not become popular for because
it's easier as well so when we
feed signals forward so let's just imagine what signal 1 of the top there and we have a link there go called between 1 and 1 and you can see it's also a white the strength of point 9 what we would do is take the signal 1 multiply the . 9 and that's what freezing the next node same . 5 times point 3 is is what would go there that's really easy that's that's not complicated at all and that's that's what is happening inside a neural network just multiple multiplying signals through connections and freezing them on to the next node in collecting them the that's
just a reminder overdoing recall the signals coming in we add them up but this time
you can see that we're waiting them reusing the weights of those links to kind of either boost or reduce the signals of color them their sequences you can see the size of Trajan after the calculation of the year you can see if you want to you can verify that is that times that sentence and that would give you that answer and that's really is a simple as as as gets and that's on your network is doing nothing
nothing very complicated marital so we had a very
simple network here we just and therefore nodes if we rotate with a pen and paper what is happening at each node so at that node number 1 and layer 2 if we rotate what's actually
happening with say it's the input 1 times that weights plus input 2 times that weight if you registered for this 1 and we restaurant again for all the nodes you start
to see a pattern and that pattern is really helpful because that allows us to write that calculation as a matrix multiplication so the weights matrix times input signals becomes the um signals
are going to the next term layer and that's really really valuable to us because of 2 reasons it allows us to
write that calculation in a much more concise way so we don't have to write pages and pages for big networks which is right wait times input is the the signal into next where the other reason that's really important is because computers number 1 can accelerate matrix multiplications and we want to take advantage of that whether it's numb find whether it's fortran libraries we had about earlier whether it's hardware acceleration so using your graphics card to multiply matrices and if we can formulate our calculations in terms of major
cities then we can take advantage of the acceleration as possible so you might say all my matrix is so boring why would you have to do this again but this is the reason I would say so that's cool we're kind of feeding a signal for forward to each layer of the network and we get an answer at the other end we know that's actually we're likely to be wrong just like we were at the start so we have an error and going back to the very 1st example again we use etc. 2
refine and improve the and parameters inside the model how do we do that here so let's break it this
and that's kind of have a simple network as simple picture just to see what might happen well we know
that we need to change the weights that we've we've already agree that we can change those on the strength of those links in order to try and improve the answer that's what playing with that's the parameters of a tuning and we know what's the what's the error um we know the is right at the end of the network if the answer should be fired and we get 3 there was to what's here inside the network is when you need to know the area in order to change the the weights so that's that's quite an interesting question and actually lots of of the guys in the books from above so that a little bit what we could do is watch the 1st thing to say that this probably know kind of and mathematically perfect answer so what we we do is we can
think the word is heuristics we think well what will will be a intuitive way of at the air inside the network and 1 intuitive idea is to say let's spaces errors 5 maybe I push 2 and a half this into enough that way that's an idea and another idea is to say that toppling 3 where the weights 3 contributed more to the area because it is a bigger stronger link it magnified signal maybe I should put more error in that direction so you splits the error proportional to the links so if if our government
weights of 3 and 1 day the links of string 3 and 3 1 1 you can see 3 quarters 0 would go to the top node and a quarter of go this way or that kind of makes sense and I'm sure there's more sophisticated things you can do that we want to keep it simple especially if it works so you can see there actually that some the error from that node and is is being kind of Split and pushed back and the same here and the internal nodes you actually collect the several fractions of error that he that link to it sounds complicated but when you see it as a picture you can see the errors flowing backwards back propagation of error that's where the term comes from error backpropagation exists feeding signals and back-propagating errors of to the calculations you can begin afterward at the point here is that your summing up the error so if the errors . 6 from that top right and this was comforting . 1 there is an . 7 supported 6 plus point 1 just collected the areas and again it's really nice really fortunate that if we did write out what was really happening in in terms of the variables and we becomes a matrix multiplication again which is really nice because we can accelerate that and we can write it in a very concise way without worrying about the actual size of the network the only slightly different because a weights matrix
is then transpose and diagonally against super super
simple OK so we've got the
errors now at each node How do we have a adoration added we change the weights OK and so that's the output at 1 of the output nodes those WC are the weights all those links inside and I'm not revealed to untangle that if you can you know well and that's horrible so what we need to do here is to say we're not going to build to kind of entangle that in any kind of nice mathematically clean way let's find other mathematical methods which are perhaps approximate but good enough so that's got a bit of a journey
imagine this landscape is a complicated function like the 1 we saw and if it's a narrow function and because that's what we had here is the output and the error function is
simply that minus for the of the
actual target should be and if this horrible complicated lumpy landscape is is a very deep complicated function which we can't
work out analytically and it would nice clean algebra another way to kind of work with it and maybe work at the minimum minimum areas is to sort of say well if this was a landscape and I don't
have a map of everything and that it was dark idea couldn't understand the whole function but I did have a torch what could do 2nd point the torch uh down in my feet and say well with its slope is going in this direction take a few
steps it's going in that direction taking the steps and eventually and you would work your way down to a minimum and some of you will we hands up and say it might not be the best minimum will come to the so this this approach which is not mathematically kind of clean and it's an approximate method but it works really well and and you can see it working really well with a little less pretend x squared function is really
difficult masses pretend that you can say that you know we started a point and we see where the gradient is locally and we can move in that direction and we keep doing that you get to the minimum um which which works must come nice and you might even be more sophisticated and say as this loop get smaller and you might take smaller steps because you're getting closer and closer to the real minimum you don't have overstep its that's an idea that's actually used in neural networks as well see that if that Arafat complicated function um was the error function then we
have a way finding that you have a picture to show you there is
so if we have the weights which is what we want to um kind of improve and we have an error function which is complicated we want to use this gradient descent method to find the minimum all that error and will then know what the right way to be safer over here with the wrong way begin at higher and begin to try and say OK I want to improve my position and move down the error function to somewhere where there is smaller and then the weights that will tell you what right weights non-negligible this goal gradient descent and it's a way of working with that horrible kind of an expression that we could
kind of to analytically before he she did write descent again with
pen and paper and and and worked out of the gradient locally it's not that hard amino induces an abrupt motions want to look at very simple calculus is the contact was you do at school just using the chain rule and nothing more complicated than that so if you are interested in having look this is what I hope is a very clear kind of blog post on that so we've what we're doing now is we've we've been we've worked out a way of improving the weights based on the gradient of the function and here you see this kind of pressure many times where we iterate you keep improving yeah OK i've only
got of a bit of time left i'm gonna zoom through and so how you might do yourself and I'm not an expert Python code from this people here law but broadly speaking you know if you wanted to do this
yourself you might think what would a Python kind of programmer class look like well with we know we've got to initialize this data structure this network that is really simple you know all we to really do is set the size
and initialize these weights to random initial values and we know that we're going to have some way of training the network so we doing the learning and we've got all have a away method of querying the network so we ask it questions and get an answer back and it's you know you if you go want to make your own and a new network library or class does nothing more complicated than this at all and I had to get enough
to so frontal land and and this is very useful kind of Python libraries number is great for matrix by multiplications and as you always reached at that site because
that's a nice some of functions in there for doing by that that curve to square that curve of graph the official function its call have built in so you can we can use that yourself and cutting things is that problem can use the things and I started programming a long time ago and then I stopped so I was
coding Python in 1999 i think 98 Python 1 . 5 0 . 6 Numeric library rather than number anyway no boosting exist there so I came back to Python and members of fantastic excellent so this this is the sum an example of
function which initializes the network it looks complicated and don't be put off all it's really doing is setting the size in terms of input nodes and nodes the output nodes and consistency here and using a number by function to randomize and the the weights which are a matrix that's it nothing more complicated in that the suicide by function there well expert always for expert that's the logistic function the curve will do the training that's occurring again really easy we take the inputs um that's of tournament in and it was a list here we can see that there is a matrix multiplication employed or not to do that the the calculation and that's it we apply the activation function to the outputs that modifications simple as that you then go the signal at the next layer how to do it again that's
it it's the simplest form propagating the signal through a neural
like as simple as that I'm sure I can make it here even more concise but um I just want to let you know really by doing this that there is not mysterious all scarier complicated really is as simple as just that and the training again is wall look scary but it isn't really the top half is exactly the same as what we just had graffiti signals forward in exactly the same code as before and then what we're doing is we're saying the Hanford errors is a target which from a training data minus what we've what worked out and then we use another set of matrix multiplications to work at the the errors in turn leads the network and then we change the weights
using that expression that we worked out with calculus that's it that's how you train work I'm sure if make even more beautiful and clean but I just wanted you to kind of get the feel for what's really going on in the network and it's not that complicated itself OK for the few minutes left I'm
just going sort of show you that with just very simple ideas that we looked at I mean you know people will have many more kind of you know sophisticated methods and optimizations and you can read quite a lot about neural networks with just with a very very simple ideas that we've looked at you can do so powerful things so we can train a network to have learned to recognize human handwritten numbers is a famous challenger dataset quality in the states that and it's got 60 thousand um training examples and it's all Free Open Data you can get yourself from the you to them up broke you you can point you to it and and there's some of the test set as well you can compare your results with others if I looked at the data you'll see the numbers they're actually the profits image using the what is not problem because you that's that's a 5 28 by 28 pixels so if we feed those found that data into a network and training actually I'm I'm missed something that we have to choose what the output looks like and what i've chosen is to say we have 10 nodes at the output and and if the answer should be say 9 the Ninth 1 has the biggest value so you can see if the if if it's 5 that is the 5th node that should be of a high value on having after below
that's what I'm using to train the network that last example is interesting because in this 1 the network thinks the answer probably 9 but it might also be for and
then you know you you you can get some really good results just with those simple ideas 96 % accuracy is what our government does go that's not bad you know so 20 lines of code and steered handwritten human in their numbers handing it over 90 % accuracy but that's that's that's not bad at all is that I think should apply you can begin to things like the learning rate of the number of hidden nodes and you can see that you can get improved performance then it might not be so much have deliberately put in and the middle 1 keep graphs there just to remind us that neural network training is a random process with starting off with random initial weights and sometimes you can go wrong and that's for the scientists among this
year reminds us that we should do this many many times and take the best of all water make sure that we not put enough and almost kind of answer remember that great descent before you might end up with the wrong minimum of the best minimum this western this many times if you rotate your arm and the original text that you
can get my tapes and I think this is actually the good because if if you look at the academic papers they get sort of you know 99 % 99 . 5 % the using really advanced techniques and I think you know it just 21 circuit that's not bad I'll just
skip this somewhat in areas how to say you can should do this the rest reprise their everything of Don and you can look at the blog well you can do with that res reprise 0 which costs about 4 or 5 euros and you don't need an NVIDIA graphics got to the end of this so the last few minutes on you know try and do alive
demos and I might said that that
might regression so this think of a number to classify this that some things a number 7 against 1 digit had 7 this right here His 1 I didn't hear actually
from a newspapers that's number 2 and a guy that right this is a network of trains last nite so that's
a good thing OK for other 3
if it doesn't work is a marvel at at at at OK let's resize that's to 28 by 28 let's say that PNG holders couldn't get of you want to explore alright if Cyprus
go that's 3 no ancestry if you look at the fugitive
you last nite and it didn't work overtime and a few and I'll stop there and and I'd love to chat about this afterwards them at anatomical the 4 questions to a to use it all
up thank you for giving them so as to why we went in few
so questions the hi and are
so we have seen the all the nodes of the numbers 0 what are you put nodes are goes the individual pixels those are yeah those are the individual pixel
so if we have an image of 28 by 28 . 784 pixel the think so you have an input layer of 784 you can choose other ideas you can say I want to rescale everything or I might want to have different features as inputs you can have you can do things like that and that this is a very simple example very naive example which takes the raw pixels and that it works but you use people will do other things to perhaps if they know something about the data working with they might say I think another feature is more likely to be a factor in the answer and much to use that to train the network instead so might use color they might use alpha values they might use something else
so any other questions OK thanks a lot at the time