Introduction in Deep Learning
Formal Metadata
Title 
Introduction in Deep Learning

Author 

License 
No Open Access License:
German copyright law applies. This film may be used for your own use but it may not be distributed via the internet or passed on to external parties. 
Identifiers 

Publisher 

Release Date 
2019

Language 
English

Production Year 
2019

Production Place 
Dubrovnik, Croatia

Content Metadata
Subject Area  
Keywords  deep learning machine learning 
00:00
Medical imaging
Mathematics
Software
Multiplication sign
Visual system
Object (grammar)
Computerassisted translation
Wave packet
Social class
00:22
Pattern recognition
Convolution
Machine vision
Civil engineering
Multiplication sign
Computergenerated imagery
Content (media)
Mereology
Computer
Machine vision
Medical imaging
Invariant (mathematics)
Object (grammar)
Local ring
Area
Scale (map)
Computer network
Binary file
Scalability
Error message
Software
Software testing
Pattern language
Energy level
Task (computing)
Singuläres Integral
02:16
Area
Building
Pattern recognition
Block (periodic table)
Weight
Execution unit
Translation (relic)
Computer network
Line (geometry)
Instance (computer science)
Cartesian coordinate system
Vector potential
Medical imaging
Computational physics
Software
Computer hardware
Network topology
Moving average
Object (grammar)
Endliche Modelltheorie
Graphics processing unit
03:55
Functional (mathematics)
Weight
Weight
Decision theory
Computer network
Function (mathematics)
Element (mathematics)
Machine vision
Element (mathematics)
Wave packet
Propagator
Function (mathematics)
Nichtlineares Gleichungssystem
Endliche Modelltheorie
output
Extension (kinesiology)
Routing
04:58
Execution unit
Functional (mathematics)
Weight
Execution unit
Computer network
Tangent
Hyperbolischer Raum
Hyperbolischer Raum
Sigmoid function
Function (mathematics)
Linearization
Hill differential equation
Linear map
Differentiable function
05:49
Point (geometry)
Complex analysis
Functional (mathematics)
Software
Decision theory
Function (mathematics)
Linearization
Computer network
Approximation
Linear map
06:15
Software
Weight
Weight
Execution unit
Artificial neural network
output
Sound effect
Computer network
Object (grammar)
output
07:20
Focus (optics)
Functional (mathematics)
Projective plane
Binary code
Binary file
Number
Wave packet
Number
Estimator
Error message
Software
Insertion loss
Prediction
Endliche Modelltheorie
Social class
08:26
Point (geometry)
Functional (mathematics)
Linear regression
Weight
Gradient
Multiplication sign
Maxima and minima
Insertion loss
Bit rate
Wave packet
Mathematics
Root
Bit rate
Insertion loss
Arithmetic mean
Hypermedia
Different (Kate Ryan album)
Uniform boundedness principle
Gradient descent
Mathematical optimization
Weight
Physical law
Moment (mathematics)
Gradient
Fitness function
Sound effect
Computer network
Binary file
System call
Number
Error message
Software
Personal digital assistant
Function (mathematics)
Mathematical optimization
Arithmetic progression
10:38
Point (geometry)
Stapeldatei
Dataflow
Complex (psychology)
Functional (mathematics)
Momentum
Greedy algorithm
Link (knot theory)
Algorithm
Gradient
Multiplication sign
Artificial neural network
Maxima and minima
Bit rate
Wave packet
Estimator
Tensor
Software testing
Local ring
Gradient descent
Information
Adaptive behavior
Optimization problem
Point (geometry)
Gradient
Horizon
Computer network
Graphical user interface
Website
Video game
Momentum
12:17
Group action
Decision theory
Channel capacity
Workstation <Musikinstrument>
Complex (psychology)
Fitness function
Set (mathematics)
Computer network
Mereology
Wave packet
Data model
Category of being
Latent heat
Software
Bit rate
Iteration
Insertion loss
Energy level
Boundary value problem
Software testing
Vertex (graph theory)
PRINCE2
Curve fitting
Force
14:02
Web page
Pattern recognition
Decision theory
Weight
Computergenerated imagery
Computer network
Wave packet
Medical imaging
Number
Function (mathematics)
Endliche Modelltheorie
Mathematical optimization
Linear map
14:38
Information
Weight
Multiplication sign
Computergenerated imagery
Artificial neural network
Sound effect
Computer network
Parameter (computer programming)
Connected space
Architecture
Population density
Bit rate
Information
Data structure
output
Data structure
Local ring
Condition number
15:13
Convolution
Pattern recognition
Digital filter
Pixel
Greatest element
Multiplication sign
Image resolution
Computergenerated imagery
Programmable readonly memory
Visual system
Artificial neural network
Merkmalsextraktion
Function (mathematics)
Mereology
Computer icon
Wave packet
Number
Medical imaging
Latent heat
Bit rate
Readonly memory
Matrix (mathematics)
Computer multitasking
Angular resolution
Linear map
Task (computing)
Social class
Scale (map)
Multiplication
Mapping
Information
Weight
Software developer
Shared memory
Sound effect
Singleprecision floatingpoint format
Software
Personal digital assistant
Information retrieval
output
Speech synthesis
Right angle
Metric system
Resultant
18:08
Area
Point (geometry)
Digital filter
Pattern recognition
Observational study
Key (cryptography)
Information
Set (mathematics)
Parameter (computer programming)
Menu (computing)
Medical imaging
Arithmetic mean
Error message
Software
Average
Googol
Modul <Datentyp>
Endliche Modelltheorie
Lageparameter
Resultant
Computer architecture
Social class
Task (computing)
21:18
IRIST
00:00
good yeah i would give the introduction and deepening have heard many times before i'm not sure that everybody knows the foundation of it will quickly go all over them i don't want to go too deep into the mathematics there because i think it's very boring actually able to point out some of the obstacles which you can face during the training and how you. and so often an sos dagger already shoulder this graphic here and he also mentioned in two thousand twelve competition new networks were introduced and this was this image in a challenge for you really given an image and you should distinguish between one thousand object classes so they are thousands like cats talks changed everything.
00:40
you can think about and you should distinguish between them and before that they will mean the engineers or handcrafted teachers how we that the whole the called and the educated people owning hits and since then there were many improvements to get even lord is to put a lot older eight and improve the performance of for this reason. and in two thousand fifteen be here even able to outperform human capabilities on the status that and.
01:10
so why is the learning so popular are compared to just in the computer vision approaches i also worked with him before you have to think about which features you need and you have too many of the designed and this is very time consuming of course and sometimes not that the couple to us in areas and there are some mentioned here which were very popular are like this. and local binary patterns and what people earning actually does now isn't completely it takes over the future extraction part so you don't take care of it you just put data and you get data out in the future extractions that is completely done by the deepening approach so in the early years of people learning if you're talking about conclusion in your networks so i have a take on the computer vision. that's why i'm always talking hold the images and a lot of the features are some basic h.p. church which was already used for example for as a civil servant or something like that when you go for the had crossed the teachers but it automatically on those middle of the features and height of the features as well and if you want to do them manually you will. yet you will not end up with their equipment think so and is completely takes over this part soul.
02:18
the question is also why is it so important also think with now because actually it dates back to the fia fifteen ninety fifty's where the idea was already interviews and the model of persepolis introduced but the main reason this that there was not enough data available to train to admit it's because when you have millions of perry to his unit millions of trees. the data probably and also now this of the a cheap use are available and quite affordable way you can train on because their high read her label and it's very important to train them and you see here in a tight line there actually is on a new network is already ninety ninety five for the church recognition but and it basically stopped and. if you begin in two thousand and twelve. ok so potential applications told the more classic partners here in the upper left you have to basic english justification they like on image net also to look at his age may also try to find a building blocks for these objects for you can also county instances and also as an empty or recommendation or.
03:21
and segmentation which is very important for example an autonomous driving and also played there and also in some other areas still you can find caption for images automatically or even yet this article which he competed against some champions and goal when he was able to. perform them. and also all yet in translation fulfil the sensational takes i use it quite often so if you don't know the value to check it out so it's very popular are everywhere now and yet this come to the next the hell they were and they aim to imitate.
04:02
humans are not on the human brain and the smallest element as the soprano also called on your own and include this usually installation that could be a vision of the stimulation or even the output of another your own or something out and they are connected by snaps of which are called weights and at the end there's no one makes. binary decision that i don't fire spot on fires and of this is how the model that the next leader was an extension there's also a biased and the terminus of the equation for that to basically have all the inputs and multiply that by the weight some them all up and then use the accusation function which should in the oval come to that and you get it out.
04:43
but it's somewhere someone has an idea what the problem would leave with this accusation function so if you have stepped function like it because you want to make binary decisions one year. ok so all actually your kind use it so you want to use building on in your pension but you can use it because you would use back propagation for training and back again use the gradients and if you use that function like that he won't be able to differentiate to make to the route to fill it.
05:16
so all of the researchers came up with some other function that basically those simulates the same stuff away from the thinking why all the hyperbole tango and four which is very useful is directly tied in your unit census so easy to calculate. and why do we need that so if you just use linear function and you have a few just like that the second show to some examples of that the net reveal the only loan linear classification no matter how many layers you will at of the not be able to solve those and that money or problems for you so basic the the.
05:50
to something like that and yeah you can try for yourself think you have access to the slight it's actually a nice them way you can try out different kernels and and you use linear activation functions will not work for the data points like that.
06:05
all that if you use it you are able to approximate this non the narrative years and can model very complex functions. so what is in your network and right now we're just talking about when you're on your network is nothing more than a composition of more than your own.
06:25
and what you what you know is the inputs and the articles and everything in between are called hidden there is for a reason because they are often come to that effect of the you don't know what happens there it's magic and this is also sometimes a problem but also researchers are trying to solve that now but that's something. i could talk another our thoughts often go into detail there is the you don't know what happened there. and your network is nothing more than any more hidden layers to it and usually via the winners now is the first network objects that they had i think it's hidden units.
07:05
and or even more and yet another yard at one hundred fifty layers already ordered and units and yet it will have a flight of the thought to be because he is a deep in your networks contain billions of weight.
07:20
in all it's gone the very small example first if you have this yeah kind of problem let pass the class and you have your ex teachers with just the numbers of lectures you attend and next to the elephant on the project and you have this training data so you have to train network you have to live with training data and now you want to predict the. the focal point to you.
07:44
so what to do is you could give been sold in numbers and then the model to give you some estimation and you also have real old port and if you want to train as excessive patients justification approach what they usually do is to calculate the binary wasn't appeals for example which is capable of taking i didn't like and his formula and. you try to optimise this problem and come up with the best ways to solve this problem. you could even the good of all related to the differently if you can also say ok i want to find out which great ever get and then you have a recreational problem and then you would use another last functions are basically lost function to find what you want to do with the data yet the.
08:26
the soul. how are they trained this is actually very difficult i don't live on go into detail there but the progress of the the aim is that you find the network weights which are most suitable to sort through a problem and take some time last functionality showed them for classification and aggression and then you can have optimized. for example the gradient the centre left and what it does it first picks a random point in this so this is all the weights and this is the corresponding loss so you want to find the. global many much they're so you randomly picked other you want and that the year zero and then you calculate the gradient soul is the gradient now and now you find out how a change in w one would affect the loss and also in w zero and this is a call back occasion typically of millions of ways and you want to find out. which rates have the highest impact on your loss and then you change the laws the east weight according to the soul is displayed for updated and there is going rate in it was actually very important for training with net roots and and yet then you just repeat the step until you get some. kind of many of its the global many model local many much looking at this moment. so what does this new weight and how can he said.
09:53
so if you help the enemy to get this he said it's a small you training on the inside the training will take a very long time and if you are in local many might can be that it's stuck there and then got out so and even some kind could even course on the fitting in this case if you pick it right it's very fast air and. probably the find the global make the government must well if you pick it too high because we want to have the solution very fast with from the advised against it because of the just leave the come on stable during training and a little and also there's a very nice pair of which you can try for yourself how this great effect of. the optimization. so rewrote the i was just showing you to wait and rewrote the soldiers were trying to find all the landscape of the last function with different weight and it could look like like that for example and as you can see it's very difficult to find a closer than any material and also displayed in dissent this method is a media.
10:57
roche so it's not guaranteed that you find a scolding sold there are some solutions i just call them now. for example a depth of learning of the are introduced momentum but basically it doesn't just using the gradient from the to ration before and keeping this information to get us some flow into the optimization problem not calculating for each step by itself and also an age in this very popular approach right now and. was also is also should be done is that you don't just use one point in your one training point in taking later lost for that you should use models on because you get in better estimate of where you are actually and it's highly her life of the slits much faster than much more likable and the other basic lewis used the highest that sites you can think of. what if your party you have. the end they are also many more but just that they also may be the most important one.
11:54
and yes and three are limited on time i put a link there you can even try this is some kind of sex or problem you can play around ten at some hidden layers and a graphical interface you can get some new ones and find out what happens you can even change the activation function also to the neon you will see it will not work even if you're at a hidden layers of the north. ok so i'm on the offing i think this should be quite clear to everyone so if you have training data and to train people to longtime were more those plates or complex for the problem it could easily be done everything from the daily training data and if you apply if you want to apply on testing day.
12:35
i could not very closely over fit or your mother was too of course some if you use the new york testify at this level to estimated couric this occasion boundaries and people under fit so this is also a very big problem of overfishing and therefore you have to improve the generalization of this is.
12:55
usually done by some regularly stations and there are two solutions a king if you hear but there are many more for example for about what this place he has just set some neurons and their respective rates to zero and this is just there because you want to actually that it does not rely on specific past to get a specific all. so yes that someone zero and to train and then in the exploration he said others and zero and this for generalization and also and this may be the most important part of a few a steep learning most important part of two halves of the day she set so you should have three sets training the day she and testing and during training you should observe how the whole year. the group behaves on this but it isn't data so only if you train of course the last on the train data the goal although and lower but what could actually happened is on the allegations that will increase again or it stops and when it stops here and the decrease this year means that it's lonely the training data not the testing data anymore so it is stopping is very important to. and he and his of a should stop the training that. so what we see them to want to have thought prince is a simple most of forcible west learning that activation factions a very important to in that introduced in the narrative he that your networks are nothing more than come decisions often your un's and we have learned some techniques for regularisation and property my station.
14:22
sold and i was just talking about x. one x. two but usually your data is way more than just two features so if you have an image like that what could be doing now how do we train and four images so the first solution and was also done by the net which were within ninety ninety six page us used all the pick.
14:42
the effect and so but maybe you see where the problems are because actually you have so many ways it's the year amount of new and here times the amount of us he was introduced many rates there and also you are completely neglecting his patient information.
15:00
all the drawbacks are that you'll have known locality preservation to basically don't use any spatial connections and you have many ways so how can be you have maintained a spacious structure and their come the that they are the conditions of layers of which were introduced or site and in two thousand and twelve.
15:20
and basically what they do with the don't just don't wait till on a matrix of weights and this matrix of rates as a share all over the image because actually if you have an edge in the upper left you want to detect the same as in the bottom right for example so it's not just for one region it's applied on the whole image that you can see he has all but this. upon so every pot and what is actually does to get some kind of activation of for the specific feature so let's say a year in the six some of this is a soul of the third or anything he will take all the edges with this kind of future. and then actually you don't not only wanted to use on multiple of them and what it does for each of the of each of the further you don't you get one activation map and the output. so if you want to around sixty four for those who get the depth of sixty four there. in so now we come to actually the winner of the image network and twelve with the attic net and it looks like the sole trustee after input image which is in this case two hundred twenty four times when the than twenty four pixels and it's obvious way to test the three channels for then you apply and eleven times eleven come to this is the size of your congo.
16:32
to shun the footer and you'll on the waits for that and they even the slightest vital for and what the stuff you can see it here basically it's get one important part and the reduced the resolution of the old but that that implies right of course that's why the neck it has on the icon years i think fifty. five hundred fifty five. resolution and it has the death of. ninety six or ninety six different fathers and it goes on like that but they don't apply striding anymore the use make cooling and what makes putting us as of yet decreases the spatial resolution by effect or two because it just looked at for example was for pick the use the next and the us lost at the way. yeah and then they did it for young and two for six collusion layers and then they basically you can say that this is the future extraction part because this is where it's still some kind of spatial information because the metrics and then this tactic and so we are coming back to the densely as we have seen before then you're just. on dec door and you train excessive quite this is actually also very important because you can train network for example for protect the city cations and then you can use those speeches which you have flown also thought of that stuff for retrieval task interest extract the features that make retrieval of fostering or whatever you want training all testify as you don't need to rely on those thought your for example just one. to have a thirty five for two hundred up to classes there's no need to train it all again you can just use the future of and try to sort of like that. yeah and now we are i was beginning with that in an awful ending the that you can see hear the results again and there is some kind of development in the numbers of players to introduce at first ever a player's and sixty one the inheritors than a silent for both came to add more filters and then.
18:27
was importing are already nineteen layers and two thousand fourteen there are twenty two layers and then into those fifteen that's where the and exceeded human performance never even one hundred and fifty two layers so it is this nice mean little just technical they asked if you want to improve the results but i would also strongly advise against that you should always try to find.
18:48
in the best model of the best architecture for your problem and also if you use very complex models you again maybe introduce over fitting into your networks always think about what's the best architectures a half. that. i. a. yes yes to use the same set of images and that in humans and will take the set and there were no i don't i don't have to take so many humans were participating but the results were like five point two percent or and i also had to amputate at some point myself and of was even mighty class or you have one image and you find old all. and then switch were in there and you'll see so many things as a human and networks to basically say see some things in the background where you don't pay attention for the as a human soul into the day. oh yeah but i think this is a pretty ok the average of one thousand i don't actually much with the used every up to class for the menu and the patient study because. keeping in mind one thousand cars to undertake this of course difficult but anyways they are also other areas to humans were all performed a lot even that myself a paper on the location estimation and humans stand little chance it. face recognition is also very good for forty minutes so in many areas of a human so but this is also a starter think of you start of the dead as someone better with knowledge in many areas or in one particular that's where the networks leckey a very good at one task but council of others. and soul but sometimes additional information can be very useful to solve a tough. you. the. the key you yeah i am yet he might follow a the it. the it. it's hard to watch. but it is awful but it helped course and yet so have it all their nominee or.
21:19
and yet so if you want to eat something and put some reference.