AI vs Information theory and learnability
Video in TIB AVPortal:
AI vs Information theory and learnability
Formal Metadata
Title 
AI vs Information theory and learnability

Title of Series  
Author 

License 
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. 
Identifiers 

Publisher 

Release Date 
2019

Language 
English

Content Metadata
Subject Area  
Abstract 
We will first give a quick review of how information theory impacts AI, in particular how a complex system can evolve into a more complex system while satisfying the laws of information theory. Second we will investigate the problem of learnability. Deep neural networks are sometimes uncapable of learning surprisingly simple problems, we will try to hint a characterization of those problems.

00:00
Computer virus
Randomization
Code
View (database)
Multiplication sign
Range (statistics)
Set (mathematics)
Water vapor
Bit rate
Mereology
Automaton
Computer programming
Computational complexity theory
Inference
Mathematics
Forest
Entropie <Informationstheorie>
Thermal radiation
Species
Endliche Modelltheorie
Complex system
Data compression
Descriptive statistics
Physical system
Computer virus
Electric generator
Theory of relativity
Maxima and minima
Bit
Automaton
Price index
Process (computing)
Time evolution
Species
Resultant
Point (geometry)
Dataflow
Functional (mathematics)
Civil engineering
Codierung <Programmierung>
Video game
Code
Rule of inference
Power (physics)
Revision control
Latent heat
Term (mathematics)
Spacetime
Selectivity (electronic)
Codierung <Programmierung>
Address space
Key (cryptography)
Information
Artificial neural network
Cellular automaton
Physical law
Projective plane
Informationstheorie
Evolute
Cartesian coordinate system
Computational complexity theory
Personal digital assistant
Factory (trading post)
Video game
Complex system
Musical ensemble
09:14
Axiom of choice
Randomization
Graph (mathematics)
Equaliser (mathematics)
Source code
Bit rate
TuringMaschine
Computational complexity theory
Data model
Medical imaging
Pattern matching
Conjugacy class
Insertion loss
Computer configuration
Different (Kate Ryan album)
Vector space
Core dump
Videoconferencing
Species
Local ring
Error message
Descriptive statistics
God
Physical system
Machine learning
Infinity
Mass
Parameter (computer programming)
Bit
Sequence
Virtual machine
Arithmetic mean
Process (computing)
Order (biology)
Phase transition
System programming
Pattern language
Escape character
Mathematician
Reading (process)
Spacetime
Geometry
Point (geometry)
Algorithm
Civil engineering
Maxima and minima
Data storage device
Mass
Drop (liquid)
Infinity
Code
Number
Inclusion map
Goodness of fit
Nichtlineares Gleichungssystem
Implementation
Computerassisted translation
Metropolitan area network
Turing test
Matching (graph theory)
Information
Artificial neural network
Weight
Code
Volume (thermodynamics)
Line (geometry)
Binary file
Parity (mathematics)
Grass (card game)
Limit (category theory)
Equivalence relation
Symbol table
Convolution
Word
Maize
Personal digital assistant
Function (mathematics)
Video game
Cubic graph
Table (information)
Matrix (mathematics)
Freezing
Gradient descent
Code
Gradient
Weight
Multiplication sign
Artificial intelligence
Mereology
Total S.A.
Mathematics
Coefficient of determination
Machine learning
Semiconductor memory
Planck constant
Information
Endliche Modelltheorie
Data conversion
Complex system
Predictability
Algorithm
Computer
Special unitary group
Pattern matching
Prediction
Computer science
Data structure
Resultant
Pole (complex analysis)
Turing test
Polynomial
Functional (mathematics)
Parity (mathematics)
Codierung <Programmierung>
Line (geometry)
Video game
Artificial neural network
Virtual machine
Limit (category theory)
Wave packet
Twitter
2 (number)
Sequence
Network topology
Readonly memory
Maize
Gradient descent
Axiom of choice
Polygon mesh
Civil engineering
Informationstheorie
Evolute
Computational complexity theory
Film editing
Stochastic
Coefficient
27:43
Demon
Randomization
Metric system
Latin square
Water vapor
Bit rate
Dimensional analysis
Computer programming
Computational complexity theory
Area
Data model
Medical imaging
Insertion loss
Stochastic geometry
Vector space
Saddle point
Hausdorff dimension
Physical law
Convex set
Local ring
Error message
Physical system
Proof theory
Law of large numbers
Block (periodic table)
Gradient
Sound effect
Maxima and minima
Permutation
Category of being
Arithmetic mean
Root
Order (biology)
Cycle (graph theory)
Escape character
Representation (politics)
Point (geometry)
Slide rule
Algorithm
Variety (linguistics)
Hessian matrix
Maxima and minima
Regular graph
Rule of inference
Number
Product (business)
Element (mathematics)
Goodness of fit
Vector graphics
Computerassisted translation
Pairwise comparison
Artificial neural network
Weight
Physical law
Line (geometry)
Equivalence relation
Rekursiv aufzählbare Menge
Software
Personal digital assistant
Function (mathematics)
Mixed reality
Universe (mathematics)
Game theory
Musical ensemble
Mathematical optimization
Matrix (mathematics)
Gradient descent
Length
State of matter
Gradient
Weight
Multiplication sign
40 (number)
Combinational logic
Insertion loss
Propositional formula
Mereology
Permutation
Mathematics
Coefficient of determination
Radical (chemistry)
Positional notation
Matrix (mathematics)
Square number
Arrow of time
Smoothing
Endliche Modelltheorie
Data conversion
Recursion
Category of being
Position operator
Predictability
Area
Algorithm
Closed set
Point (geometry)
Physicalism
Flow separation
Measurement
Element (mathematics)
Proof theory
Wellformed formula
Vector space
Prediction
Normal (geometry)
Hill differential equation
Metric system
Resultant
Saddle point
Row (database)
Neighbourhood (graph theory)
Functional (mathematics)
Real number
Artificial neural network
Virtual machine
Limit (category theory)
Equivalence relation
Law of large numbers
Fourier transform
Wave packet
2 (number)
Root
Iteration
Average
Touch typing
Nichtkommutative JordanAlgebra
Spacetime
Software testing
Gradient descent
Mathematical optimization
Condition number
Module (mathematics)
Expression
Computer program
Planning
Correlation and dependence
Axiom
Computational complexity theory
Number
Stochastic
Physics
Coefficient
Convex set
46:11
Hoax
Weight
Multiplication sign
Execution unit
Computer programming
Dimensional analysis
Sign (mathematics)
Machine learning
Spherical cap
Bit rate
Different (Kate Ryan album)
Square number
Arrow of time
Error message
Physical system
Algorithm
Sampling (statistics)
Maxima and minima
3 (number)
Surface of revolution
10 (number)
Arithmetic mean
Process (computing)
Numeral (linguistics)
Vector space
Order (biology)
Right angle
Resultant
Point (geometry)
Classical physics
Slide rule
Implementation
Functional (mathematics)
Random number generation
Real number
Virtual machine
Translation (relic)
Rule of inference
Number
Wave packet
Twitter
Goodness of fit
Finite set
Integer
Nichtlineares Gleichungssystem
Computerassisted translation
Traffic reporting
Information
Key (cryptography)
Artificial neural network
Forcing (mathematics)
Weight
Expression
Plastikkarte
Evolute
System call
Convolution
Approximation
Computational complexity theory
Personal digital assistant
Function (mathematics)
Mixed reality
Video game
Iteration
Coefficient
1:00:01
Axiom of choice
Point (geometry)
Predictability
Simulation
Axiom of choice
Theory of relativity
Multiplication sign
Axiom
Infinity
Equivalence relation
Sequence
Sampling (statistics)
2 (number)
Sequence
Prediction
Finite set
Right angle
Nichtlineares Gleichungssystem
Resultant
Social class
God
Asynchronous Transfer Mode
1:01:54
Key (cryptography)
Musical ensemble
Informationstheorie
Computational complexity theory
00:01
[Music] [Music] I'm the last talk of the day so don't expect to have too much deep things also I'm going to talk about deep learning but I have a quiet talk it will be too bad because I expect to have a lot of 45:1 worse or have a put two topics one topic is what is the relation between if info accent illusions and information theory what inferences we can tell about artificial intelligence and I will got into more involved part about learn ability last subsequent selectively new are all probably enabled that's a question that we are asking ourself so first if I want to continue the description we have with your talk that basically one question that Stephen Hawking asked himself up to you with other people what will happen if uh Trent engines since based on the technology not be a biology but electronic technology supposed to be faster what will happen if our intelligence supersede human and basically to be the end of civilization even worse may be the end of our job research job or so that's a question this is very important topic more for young people because for people my address is important but some but will basically will tell inference the first thing that interventions in terms of a cell tells in the factory system cannot evolve by itself into a more sophisticated complex system because basically the entre peak of the feature of the system since the future stem is a function of the system itself the entropy will be smaller you can only not we can only decrease the phone you cannot expect that if there is a basically a poem in the computer which is just working with itself involved from itself if you're not going to do something better but he was doing before the only way to make a system an automaton to create a Torsen with more complex than the old one when you introduce entropy from outside from entre P could be randomness it could be data data set and basically it was our twenty engines is doing is just collecting huge data and try to convert into mean system that will be better aligned with what it's expected to do on this data so for the point of view of inference theory is as if basically you add to you original system randomness and there is an equivalent in his life evolution life evolution has work from simple system into for complex system I were self but John and yours are more complex than ourselves in fact very surprising the birds are more complex and members I'm not doing math but the more complex in fact we don't know we're not doing math and the mutation are mostly a random and the being was a mutation from cosmic radiation or so for virus virus are not random but mutation and what happened of course evolution selects the best species the problem he has a main problem what does it mean best faces the best does survive the competition a little the Toriko but it may be recall betrayed as a result again see the result every day we work in the forest range and see that basically the system when we arrived is very complex I think the human is somewhere in Java we always put the human as a tapas evolutions on there but there's no reason that if you get the tapas evolution so what will is it possible to apply this model the model of life evolution to the evolution of code basically 20 janeshia how codes can involve to something more complex more useful so is it possible to have to adapt this version of life to cut a generation first as we said we tryin to have a selection of specific is not useful because it is very novel totally goal every time we've just learned that a species involved because for something completely impossible to imagine that a nose was longer and because of the flow world for his his father the world thought that the bit of rice butterfly the end we survived another longer because the the the froward heart are thought to be deeper and therefore the Africa's short no they cannot survive so the but a five is longer no survive but in this case the flowers become differently Penta and at the end we don't understand very well the usefulness of the of the project but an encoder can we create a digital ecosystem where we have a specified that basically rules for the costume to fight each other another question is is it possible to have a large enough ecosystem because you have a year two or three codes competiting it will be very useful for something large enough another problem how can we verify verify me such if I imagine that all this program generated randomly are used to to control your car on the airplane where you are when you are traveling you would like and the minimum that should be certified ice unclear the car Google car just run over women because the case that a woman was not at the right place was not in the program a concern it is something it's not existing less formal on it but now I would like to compare the power of life compared to the poll the poor world over of our computers we have the rise of our intelligence has been done because we had the application of more law since three decades and a safe indication say for decades so we arrived to a tremendous power commercial power all the idea about deep running thing like that but neuronal network are from the 60s but from the 60s basically there was less less compression power in computer than in this know that is already too sophisticated names in tassels n in this key without the electronic part so it was many less to imagine we could keep running with basically a screwdriver but now it is possible but let's compare with life everybody has tried this I tried this when I was young took a liter of water sea water and just you have a microscope you look at small small screens of influencing like that and it's like an ecosystem this is one liter and you can imagine that one day one day also shrimp together the
09:22
bacteria the bacteria the main source of mutation as a bacteria the bacteria and generate 1 kilo byte of new muted core God ever litter furder that's a certain assumption why this assumption in fact it is  it is true when I did the first talk was just might random and in fact was a to that first so 1 kilobyte of code every day per liter and basically life appear 1000 billion days ago the ocean volume option is one billionth of cubic geometers and every cubic kilometers is 1000 billion liters I check it again the method correct therefore there is a part as a user the user this user space of life on Earth is a space of ten to thirty six kilo byte and this is a lot I can tell you it is a very long imagine that you have a magic computer then you can install one bite and that's exactly one bag per atom ten to thirty six kilo byte will need montane of the size of arrest with one kilo byte per marathon and if you look at the quantity of informations related by mankind I'd say this talk is of two years old so you have to apply my hundred of course so it is an order of less than as we said mole of information more important we do tend to 33 here it is 10 to 18 this is equivalent in our model to store all this information in a Sun grain of mass of 10 to minus 8 Cuba so you can use more and if you look because of most of the information created by mankind is a streaming is videos of course book through PDR but if you look at the code working code is equitable code it is much less the number of executable codes a photographic ated that's another of a renal editable line of code created by computer scientist is 100 Giga Byte which will be compared to that less than a nano drop of a little drop of air nanodrop of I am I say so this ah here because I was expecting to talk to speak doing or doing one word therefore I don't have enough material so this is something has nothing to do with the description but anyhow it shed light on the complexity of life the human compared to human being human being has gene a of equivalent of six Giga bits of information and one the baby that when you when there is a new baby this is parties by drop apart you have a new baby coming and basically had to share twenty thousand code genes either coming from mother or from the father and therefore it has only two possibly one possibility to possibility father from father or from mother the the complexity added to mankind by a baby is 20 kilo bit and it is less than the conceived information contained in the SMS was it in the tweet you used to to announce the birth of the other kid don't tell that to your children that was a mistake if you take a vegetal or something strange the vegetal have a DNA which is larger than the DNA of mammals which is twenty gigabit or the reason but nobody know the reason of course but the potential reason it's a fact that there's a change of climate in soccer five I can clean and on earth the the animals can move on and the vegetal cannot move by definition therefore they have to find judgment ah there is one general again sir warming oven gentle gentle freezing and to activate to activate it therefore every time there is a space extinction you move 20 gab it to drive complexity on every time there is a birth you add only 20 little bit be careful when you walk or don't walk on grass before checking it is not something which is under extinction the temporary conclusion the parrot inclusion that base is a good news for us that at least that we cannot expect even into 2030 that our intelligence will be able to say very sty with mankind not yet to do math lessons that I believe I do believe that to evolve independently if not expect every time you show there is advertisement about the miracle of our intelligence you have to have more thought about all the engineers the mathematician the computer scientist work to make this happen and this is not only the training of a neural network and working you have a lot of thing to do in order to have to have result oh god bad news that if we want to get rid of our civilization of course oh let's complex way to succeed uh all this good just check yes so in case I was too long I still having it that's what I just said that this is what we'll say Shannon  chewing could miss because lecturing invented our intelligence to tell oh yes your thing is very great but there are limitation be careful and in fact it is true because Shannon also work on our 20 nieces on and design won the first passion to help for maze escape starting with mechanical mice before I thought was a well mathematically because in the picture was not clear and also design a mindreading machine the man reading machine is basic to take advantage that the human brain is not good source of randomness to be able to predict what is next movie it works very well ah this in exercise because the sensors maybe if you think is too bored you can think about this small exercise in exercise it is a what Turing have replied to to Shannon images for any problem you can think that Europe on your self rest of the talk you have a ministry on the top of Everest and there are most darling there since ever their everyday job is to pay the weather of tomorrow via RT intelligence the question good except only a finite
18:31
number mistake since you you have an infinite number of days and before doomsday God is continuing if it is more than finite it will be it won't be there is a by because that sacrifi war laughs i've been a banging weather bad weather one for nice weather the only that has set is for each market every day is a few need seconds of pathway there on prediction the hint the monks have to make a choice you can sink during the talk now I'm going to go to move involve add research last les pido philosophical they will last less story about life evolutions thing like that anyhow I'm still going to talk about cats on dog and the question is there a limit to run ability cats on dog because basically the symbol of our intelligences so most successful stretched in successive intelligences how to detect there's a cat or a dog in a picture of course not small and that unfortunately so the question what I would like to do I will ask my neural network to be able on a machine learning system to be able to discover an algorithm simpler of course because I have no way I have no algorithm to detect if there is a cat a dog in a picture in fact what I say is wrong because a neural network is an algorithm so tight solver it's a mutation but my question well will be the consequence if machine learning will be incapable to discover simple algorithm first it is not a very tracing result because as we said it is completely pointless to use machine learning engines to mimic algorithm if you have an algorithm don't try to to make a machine learning deep learning stuff just use it in fact if it fails it will be the only case when we have a polynomial P equal NP because if you ask the machine to discover a prenominal algorithm it will find an non phenomenal solution something it will convert very slowly but in fact it is not that useless to ask the question because if the machine is not able to detect an algorithm to apply it it meant that on some problem it will not be able to converge because some problem will need that the data will be processed by a specific algorithm if the machine is not able to detect that you have to process the data you have two conversions if the machine will able to date adjacent gasm need to be applied and but it to mimic the sqg algorithm to have good convergence we will have an invessel solution and of course we will be back to the beginning of the table that we have a universal machine tears of all problem so the question is machine learning and our intelligence capability able to detect simple algorithm and in fact the conjugate to in fact the answer is no there are problems where the when deep learning cannot convert to a solution the continent is sorting fully transformed by the pole then the classic example we have led the convolution you need to basically to implement a convolution system without the convolution not able to detect that zero many equations and you not convert properly fat your matching pattern matching something that cannot do very well 300 sure this is a Cossack of pattern matching and parity function and in proven that by definition that the deep learning cannot convert to a solution budget function is you have a sequence of n bit and the grunt roof is one bit for example is a parity the number of nonzero bit and you take a random tidy function then the your system how long you you train it will give an average error of 1/2 mean that won't converge and we not even give the clue about the correct answer a first important point is the fact that when you are not working in the Turing machine there are some unique we have to be careful because there's some question about the the memory therefore you have to consider and need a recurrent neural network but basically this mean that if you have an algorithm just by adjusting the weight in your neural network you can Clementi algorithm but of course the question is that machine learning is not programming it is training than training via stochastic dodging the Sun can be trained my Alamein but for deep learning it is training we have already a free talk about this therefore but I want to show you because I will show you forget and this is nice part of my talk basically you can see a neural network like the box with neural network these weights its mattresses I'm into more sophisticated description but basically view from outside it will look like coffee machine if you consider this this design of coffee cup without design and all the weight of when you are networks are in the mesh you should cut it it will answer you that which also you cut this probably the weight cut with fifty hundred percent 90 percent show you another even this cat you see this case is very troubling to challenge the system show dog here I don't know what will be the result but so I assume that if they will say 50 percent captive percent daughter yes there's no limit to human imagination to fully machine but how it works basically you enter mesh mesh is a sequence of numbers it can be 1 million numbers here just put 10 numbers the machine produce a prediction with neural network which is inside and knowing that a true result is 56 it will compute the difference between dirt through the ground truth and the prediction and during the training phase it will adjust the coefficient inside the machine so that it will be always close or closer to the to the ground truth but every time we change the training images I thought it will be gradual descent and say stochastic this one because the image we select a whole is random sarcastic it will drag that and it's a gradual so that machine descent you sacrifice your life you consider that you know the result you compute the gradual you know this vs. system with mattress mattress and some activation function something is mathematically completely trivial you can compute the gradual and then adjusts the weight accounting the
27:45
grandeur of the loss function suggest you reduce you take the way to conjure the negative way so that you expect to reduce the error the average arrow here's our two example one example with when you adjust when you adjust the gradient vector this is norm but it's not a new thing it's not interesting and the question the big question are going to ask myself can we train a machine in one network to extract the maximum of two numbers Sothis is not efficient of cats notation of dog not question to suppress a demon humanity mankind is I won't have two numbers I want to be able to extract the maximum of two numbers something which is very simple it turns out that you have this expression axiom of two numbers and this expression in fact this is neural network because if you express it as a neural network you use as the [Music] activation function theory that nobody used in practice is a role and it has this magic notation this means that I always take the positive part of the vector and that applying again another matrix and this is equal to the maximum at twice the maximum the two numbers therefore it satisfies the property that knowledge works is a training machine and it can extract the maximum of two numbers and not too much effort even at that time afternoon you can see that you can extract the maximum more than two number four numbers here separate the case of eight numbers just magically you rather you have a specific block to extract an excellent two numbers you have a question way to to extract the maximum numbers so nothing nothing's pain nothing special you take log n layers it's it's very simple just several there are similar way to adjust this recursive are not regular signals recursion if you're our network because instead you can take the maximum of the numbers and you separate what is above and below but you can mix today and in fact it is not it is not innocent you have many combination of the neural network only give you the correct answer so maybe it is a reason why we see some problem first how good is a gradient descent first it is actually impossible even for a simple problem to to get the optimal optimal neural network will always end on a local minima which is a condition when you go down going on some more money stop because it just under local minima is notice is a global minima there are a lot of story that the che cazzo system you fell and then you go bigger you lower up but it's always the same story you never never not a lot of chance to end and a local minimum even worse since Brown huge dimension you can have vicious saddle point subtle point where basically it is very stable but in fact you you go one step here and then upheaval there and then and you just do a cycle around the side or another point it is in theory but in practice nobody has endured if the local minima are close to each other in some definition of closeness it's good if the local minima are far from each other which is bad so good is a very close far from each other very bad so the question how training can reach a good weight vector a good local minima so the weight vector now it is something and we explain the game you will take a neural network but I'm not going to take number one the way to take cats you know what you if you had a small kitten the first time you you put the small kitten on the mirror and he show is image it's a panic so what I do the same with a neural network I put a neural network of throat against another neural network what is knowledge is adversity this case is simple I assume that the grunter off is given by a neural network and also neural network and I will select the weight of this grand truth neural network at random and we never touched them and as I said there are many if you have a for the root of the loss of function you have many many competitor because you have many permutation of lines : that are possible candidate for being the optimal in your network the fossil loss function has a number of root of 6 factorial and cheese which be basically because it's n is 1 million its so the exercise you take an aquarium of large dimension and this will be the place where you select you you the root of your loss function I find the roots all of from front now correlated menisci assume they are not correlated at all which is sufficient for for the result and you define a simple loss function that has nothing to do the Latin descent which is this product by support we can take something more complicated but it is not very easy to do the math on this and you prove that basically there is a quasi blackhole effect that the the minimum the local minimum is not a global minima but it's something risky so trade of the global minima each of this point is a global minima but the local minima it is a saturated and he's always convert to this local minima which is not a good local minima but it always convert to this to this one and there are some experimental and it is working yes yes not very interesting so that when you have less than four you onedimension ten and have less for global minima it works very well it's always gradient they said always converge to the global minima but if you take ten for example then it will converge to the solid something you should look like for Socrates not a thought when is something this is more complicated but if you look at so tactically when the dimension increased it it tends to centroid notice this for any kind of then we look how it applies the maximum this is not for the maximum it is a marginal proof for any training and neural network with ground truth is given by another neural networks coefficient is selected Adam Lee and this is the element of truth in that educator show that the gradient with hyperox is the roger convert from the original in fact convert to the average the so trade of the of the of the root
37:00
and that saturate a look at that how long can look at that without incorrect justice thing to prove that the function is convex with high probability around this country therefore if you enter some trade you will never escape the aunt of the vicinity of the whole universe escape in fact this proof works for every even the root are correlated with the case work for many metrics but you have to be careful in these cases and worse loss function is smooth and in fact the loss function is not smoothly something very complicated their forties you cannot say that it is true for the real gradient descent in for neural network but assume that it is true this is it tell you that if you convert to the so trade if the average if the software is not zero then by the law of large number you will give an error that error you will do will be basically we said a well compared to the tutututututu the length of the lens's the modules of the saw trade the normal substrate I mean it and if I take a take a test vector and I apply the text vector X to the well neural network and will be close to if I compare with what I will get with a so trade neural network I will have something which will be very close via square root of n in fact is a since this order and those are always on the other one over square root of n and in the dimension will be very close vertically and the parent happen if if you are so trade is zero of X close with you this will bad news in this case therefore the error the receiver all you will do so this is a receiver the receiver all you will do compare with the exact prediction will be to important in the vault Aveiro in fact would be infinite and if you took the maximum the maximum the coefficient the average of the you have to exaggerate repose the coefficient of the matter if you see that have a prescription is zero so you expect that the max finding will be a special case where basically the saturated will be zero and therefore you will expect not good result and it turns out that many algorithm algorithm unsigned numbers hours and have this property that we call the zero min wait here it is the training over the max so all this I trained nine neural network randomly selected and the position in the the plan is that the vertically the vector of their prediction on to test vector that have a advantage and read a row is a ground truth so it was well for two numbers so two numbers Congress will [Laughter] my phone number is convertible less well and then eight sixteen thirtytwo there is a effect of the that we assumed to be the effect of the German white property tax to be the problem to compare we we try the Roman wait London neural network not the maximum run on your network on turns out that what is turnin touch is going a very bad in comparison if you take a non zero mean white function it come here it's basically the rule of not number you guys know nothing smoking very well so basically if this is true because as I said I use a toilet model to to to show it quickly if true it mean that there is a swap area in the running if you have a problem and if you're a garden distant go into the area that I caught a swap area where as a coefficient are very has a weight zero weight can be the mean way it can be the third movement way it can be anything like that then it doesn't then it doesn't converge it can be blocked and the converse will be very poor and if the solutions optimal meaning maximizes optimal network ah is in this area then you can expect that it will meet one convert it means that your system will be able to to find the minimum of several numbers now there is kind of equivalence the conclusion record can be real an ability those kind of equivalence programming and learning we know that from Turing that contamination is undecidable in general we can expect that running convergence may also be understable in general there are some propositions if this area is evenly rent with this it is easy to to state that the problem is not not terminate but those are the few teeth how did you find the blood convergence that doesn't come natural well of course oddity but we know that we can prove that some program tell me and terminates fortunate Clippers are pulled up program is company or your plane your car it is proven to terminate sometimes therefore can we hope to have somewhere in some special case to detect that program that a problem will not converge well and how is it possible to run to try neural network to detect which algorithm you should use the dot to escape the swamp area never so the question that I was asking that Stefan start to look at is can we train run and try to detect with physical laws applied to obtain some physical measure there are some a journalist for physics but you don't know you don't know how to simply the seconds of physical law to apply which physically says is there a new physical law and this is a very interesting question because if you posted a physical law is like basic algorithm so variety it may be also difficult to to find the physics the basic physical laws but this is a question so that's just first also water has a wanted to to show you now if you want to know the solution of the problem for the mountain it is one slight after but you think you know the answer already so I'm not going to to go into these slides you know comparison between
46:24
life evolution and AI and machine learning as you say life has no goal no purpose of evolution that could be you know but I actually has Banco tell you matching what you want it to achieve right so would that make a huge difference because you say the life or evolution has done a lot of computations basically well use this competition so is it could that make a I'm more you know help it read faster is a agree with and the question was can a 20 legions get rid of of human intervention that was a bad equation when you give the answer if you are able is a human is able to give the rules its if it works of course it is working but if you are doing vague rules you will need what computation if you if you if particular you know the rule you have to give you know the famous expressions as a command do what I mean on the computer and because when you want to debug the program and at the end you laugh at what you might do but I mean it doesn't work so in this case is what is a minimal quantity of information to rather something working in that I don't know this is not sufficient to say make me happy I think I guess in the day one had predicted that these problems where you asked for exact precision and say how these two numbers even simple means maximum program I mean there's no hope in any case that converge require implement American precision therapy how does this relate to what you actually say say you know what I said it if I need an approximation of the algorithm well it does exist an approximation of not an approximation of a sample report of sorting approximation is writing them the maximum and it will help to to find the correct neural network then were happy if I have a something give me the maximum with an arrow of I know 5 percent beyond what could be expect with the present deep running the system III mean if you know I said you're right or not so what do you see if I take as an example assume that it is impossible to find the convolution algorithm communal you don't have made a convolution but I know that I need it to be able to recognize a cap because it's a design with a convolution algorithm if I train my neural network and the picture of cat without this convolution algorithm inside it won't work it won't convert it's a bet because I never tried if you have the convolution algorithm but with exactly the Commission's on it something which is very close it will it will find the cat at the end because you will just add the error of your imitation of the convolution algorithm to the error you have when you have already built in the weight of the neural network your convolution algorithm because although consumerism is a neural network but the key force some coefficient to be 0 and for some other coefficient to be a logical in process in translation that is the convolution algorithm you force it it works well if your system will be able to discover that you need to have the coefficient to be zero and nonzero coefficient to be identical by translation then you will be able to find that there's a cat in the picture and in the picture you just add the error of your fake convolution algorithm is just used to say that you need to prepare your data on your neural network differently in order to have the good convergence because you don't know if you don't know the very beginning and you have to you revolution so guy who discovered the competition is a very smart guy and many many people find it but that is something he has the machine cannot discover if this is true what happens if you try to learn I said I see you I think a try no not work but maybe my neural network right on good day it doesn't work what will work is to have the recurrent neural network which is something as a story being they can it means that you busy you inject that time surprising people usually say that if you have number of units you're less than comparable to America then there can be lots of go but as you take many more units you don't have these if you add many layers you will melt a number I'm sorry I did not just on the question
53:13
you mean you increase the dimension of the conveyor ah in the flavor I don't have a definite answer for that in fact I have no definite answer don't ask any further question by the way I don't have the answer but I will say no just increase the number of possibilities and the quiz and now in this case you will go again to the to the max to the secrete but faster and if you are the citrate for the maximum is not good be careful if you've tried to find the maximum of non positive number then it converge but if you do not sign numbers then it has a that it is my guests of father folio question if you increase the size of the the dimension the dimension of the as a system no you did it for just two but I feel it will not converge you add more in known to convert you require infinite numerical precision I could always give you two numbers function never get this exactly can always find for any implementation yet any fixed number of news I can always find two numbers which are so close together no no it's not a question the numerical my problem is not a question call accuracy if in fact show you you this is no trying square in key senses songs next problem no I'm sorry if I ask the maximum of 32 numbers and this give me this error I will set is not prime numerical instability it is a problem that it does not convert integers I just know it is a numbers on there are only this finite set of numbers no known as I it's an R and I know a training grant numbers it's something so very simple but it is nothing to do with numerical instability it's my bad just uh that converges that find the magic doesn't mean you're close to the mix the size of the data yes you don't look at the accuracy of the weight you look at the currency of the answer of your neural networks when you test it with with well vectors because you may have the neural networks are very far apart but when you test a neural network they give something very similar no note is the what see lost function for the next square of the difference between two maximum at it give why I mean why I said if I have two numbers it's a super small then you don't get a small error but if you could have two numbers which are simple outfit still close together and no know the number between the numbers nothing minus 1 plus 1 I didn't do something complicated oh no I was not sitting like that I can I can but no I to the random numbers 32 numbers between minus 1 1 but real numbers uniformly because no no trick here the attack we knew formally if I have to look for two numbers it converge to something but will give an answer be correct and if I was rich enough to make the to make it converging in during a hundred thousand years it will go through arbitrary precision but I'm not asking for that here that I say I'm not signal trickiness by the way but I did it wrong convergence and these are stable these points are stable competitive stable you can wait for you can add many thousands by the tens number or iteration just the stable this is a final final result they don't move anymore now they move because a learning rate is not zero but they've got  classical zigzag rounder I want the value the nothing special but it me but maybe there's a burger I mean what I thought is completely walnut that's a good subject of research yes it's vested west trend for the maximum rasa ria finger I don't try to convince you because I'm not convinced that by the weather no not is not the mathematic question to say that after numerals are equal is a have a function I under two numbers and give me an answer and the answer I under 0.45 and the  0 319 give me 0.1 as I consider that the answer is not acceptable it was giving me the 0.42 0.44 I will be very happy that there they are is a very big very big and doesn't diminish if if you select another initialization of you it's not like in your case that you take any necessary in a situation it goes to the good local minima here you take a new in a situation it goes to another minima which is not good neither no matter the next function finish noises because I splendid didn't stay well sorry I think you are tired we are continuing our slider so I show so that's not just too too close to those slides I was nothing it's a shame I'm a honey but I get I get
1:00:03
the heat by the way was very simple the monk are just using the actual choice this is and I good I gave a wrong hint because I assumed that you had the all the seconds all the sequence of weather and unpredicted in fact you need all the second serve weather with the excellent choice the the mode can just figure out ah ha ha the sequence I can feel the seconds to doomsday assume the Tuesday is tomorrow and and just consider that because that although the weather next to do today will be bad weather I could 0 I think I don't have the right to show you because a good apparently so they use the actual choice and basically all these sequence they have one equation taught given by Jack some choice and the monks are giving this present and the relation between two seconds two seconds an aberration if they did they differ by a finite number of point therefore they all chose the same resultant and the same president is in the simulation class of the seconds of weather therefore the prediction will find only a finite number of time and God will be happy yes but the trick is that we assume that the moon's are able to manipulate infinite sequence it was a tool toys
1:01:56
toys example to prove that information theory is not a complete theory because you need to applied and computational key of the data she says it was with Fabian exactly don't ask me question about them so long [Applause] [Music]