6th HLF – Laureate Lectures: An Introduction to AI and Deep Learning
0 views
Formal Metadata
Title 
6th HLF – Laureate Lectures: An Introduction to AI and Deep Learning

Title of Series  
Author 

License 
No Open Access License:
German copyright law applies. This film may be used for your own use but it may not be distributed via the internet or passed on to external parties. 
Identifiers 

Publisher 
Heidelberg Laureate Forum Foundation

Release Date 
2018

Language 
English

Content Metadata
Subject Area  
Abstract 
John E. Hopcroft: "An Introduction to AI and Deep Learning" A major advance in AI occurred in 2012 when AlexNet won the ImageNat competition with a deep network. The success was sufficiently better than previous years that deep networks were applied in many applications with great success. However, there is little understanding of why deep learning works. This talk will give an introduction to machine learning and then focus on current research directions in deep learning. The opinions expressed in this video do not necessarily reflect the views of the Heidelberg Laureate Forum Foundation or any other person or associated institution involved in the making and distribution of the video.

Related Material
00:00
Prime ideal
Turing test
Internet forum
Musical ensemble
Algorithm
Universe (mathematics)
Computer science
Mathematical analysis
Bit rate
Data structure
Information technology consulting
Physical system
01:10
Convolution
State observer
Spacetime
Randomization
Pixel
Code
Correspondence (mathematics)
Programmable readonly memory
Combinational logic
Bit rate
Function (mathematics)
Computational intelligence
Weight
Dimensional analysis
Fraction (mathematics)
Medical imaging
Derivation (linguistics)
Maxima and minima
Plane (geometry)
Machine learning
Bit rate
Kernel (computing)
Vector space
Square number
Matrix (mathematics)
Error message
Logic gate
Position operator
Machine learning
Algorithm
Touchscreen
Product (category theory)
Spacetime
Mapping
Basis (linear algebra)
Infinity
Bit
Surface of revolution
Unsupervised learning
Functional (mathematics)
Virtual machine
Internet forum
Category of being
Digital photography
Process (computing)
Vector space
Data storage device
output
Pattern language
Cycle (graph theory)
Data type
Fundamental theorem of algebra
Weight function
Row (database)
Computergenerated imagery
Virtual machine
Student's ttest
Distance
Thresholding (image processing)
Theory
Surface of revolution
Wave packet
Number
Power (physics)
Linear subspace
Unsupervised learning
Average
Energy level
Integrated development environment
Software testing
Data structure
Computerassisted translation
Subtraction
Units of measurement
Support vector machine
Information
Artificial neural network
Cellular automaton
Projective plane
Content (media)
Set (mathematics)
Cartesian coordinate system
Rectangle
System call
Convolution
Summation
Kernel (computing)
Logic
Lie group
Device driver
Hyperplane
Coefficient
Window
Gradient descent
20:37
Axiom of choice
Randomization
Pixel
ESPRIT <Forschungsprogramm>
Length
Gradient
View (database)
1 (number)
Artificial intelligence
Compiler
Weight
Computational intelligence
Dimensional analysis
Formal language
Medical imaging
Maxima and minima
Stochastic
Derivation (linguistics)
Sign (mathematics)
Video game
Mathematics
Singleprecision floatingpoint format
GaußFehlerintegral
Error message
Logic gate
Hidden surface determination
Curve
Pattern recognition
Process (computing)
Electric generator
Rational number
Spacetime
Real number
Sampling (statistics)
Bit
Sequence
Functional (mathematics)
Formal language
Maxima and minima
Internet forum
Category of being
Digital photography
Process (computing)
Right angle
Mathematician
Data type
Task (computing)
Adelic algebraic group
Point (geometry)
Computer programming
Divisor
Real number
Computergenerated imagery
Tape drive
Artificial neural network
Translation (relic)
Limit (category theory)
Theory
Wave packet
Number
Term (mathematics)
Energy level
Computerassisted translation
Subtraction
Gradient descent
Task (computing)
Shift operator
Artificial neural network
Independence (probability theory)
Variance
Line (geometry)
Cartesian coordinate system
Word
Revision control
Iteration
Object (grammar)
Fiber bundle
Gradient descent
32:58
Computer chess
Length
Materialization (paranormal)
Artificial intelligence
Bit rate
Shape (magazine)
Computational intelligence
Mereology
Dimensional analysis
Medical imaging
Mathematics
Video game
Cuboid
Position operator
Area
Pattern recognition
Algorithm
Physicalism
Computer scientist
Bit
Automaton
Surface of revolution
Functional (mathematics)
Degree (graph theory)
Internet forum
Category of being
Network topology
Computer science
Right angle
Whiteboard
Quicksort
Freeware
Data type
Computer programming
Domain name
Virtual machine
Student's ttest
Theory
Wave packet
Moore's law
Goodness of fit
Root
Database
Energy level
Artificial neural network
Machine vision
Cartesian coordinate system
Compiler
Faculty (division)
Universe (mathematics)
Hydraulic motor
Game theory
43:21
Area
Internet forum
Domain name
Musical ensemble
Scientific modelling
Decision theory
Interpreter (computing)
Bit rate
Computational intelligence
Mereology
Cartesian coordinate system
Wave packet
44:56
Internet forum
00:01
[Music]
00:22
it's my pleasure to announce the first
00:24
speaker John Hopcroft from Cornell University John is well known in the computer science community and I think beyond for his work on algorithms and data structures on which he published a book which has become a classic over many many years together with Jeff Ullman and her da who he was awarded the Turing award in 1986 together with Bob targe and he was also here for fundamental achievements in the design and analysis of algorithms and data structures and recently he has become a consultant to the Chinese Prime Minister on the subject of reforming the Chinese educational system in universities and he's also a member of the Chinese Academy of Sciences which is something very rare for foreigners John will talk about a deep learning and AI very hot
01:13
subject these days and on the floors yours yeah [Applause] no it's it's a pleasure for me to be here today and to give this talk one thing I will mention though is I'm not going to talk to the laureates who are upfront I'm going to talk to the young researchers here and I'm going to give first a brief introduction to machine learning and in case you've know nothing about machine learning and and then I will talk a little bit about some of the research issues in the area but first let me tell you that I think we're undergoing an information revolution and the impact on society will be as great as the impact of the Agricultural Revolution or the Industrial Revolution and one of the main drivers of that is is learning theory so to start off the basis of learning theory is a threshold logic unit this device has a number of inputs and a weight on each input and it calculates a weighted sum of the inputs and if the sum is less than the threshold it outputs a zero if it's greater than the threshold it outputs a one and I'll give you a simple algorithm for training this unit what you do is you set the weight vector equal to one of the patterns and then you repeatedly cycle through the patterns and if a pattern is misclassified you either add it or subtract it from the weight vector if you want an output one you add it to the weight vector if you want an output zero you just subtract it from the weight vector and this algorithm will quickly converge to a solution if the data is linearly separable and the reason I put the algorithm up is there's one thing I want you to remember about this and that is that the weight vector will be a linear combination of the patterns so if the data is linearly separable the algorithm will converge and find a hyperplane which separates it but what if the data is no linearly separable then what you might want to do is map the data to a higher dimensional space and just as an example in this set of data I might add a third coordinate and move the data out from the plane by a distance equal to the square of the distance from the origin and that will pull the zeroes out more than the x's and you can find a hyperplane that will separate the zeros in XS this mapping to this higher dimensional space might be to an infinite dimensional space and you want to run the algorithm that I showed you in this higher dimensional space but what is interesting you do not need to know the function f you don't have to calculate the mapping of the images to run that algorithm all you need to know is the product of images so to see that AI is one of the images and F of AI is the mapping to the higher dimensional space W the weight vector remember is going to be a linear sum of the images in the higher dimensional space if I want to test another image a sub J I'm going to multiply the weight vector times F of a J and you'll notice that I only need products of the mappings of images not the images themselves so and if I have to if I'm going to add an image to the weight vector I don't have to know the image because all I'm going to do is increase or decrease the coefficient of that image and this allows me to run that algorithm without knowing the function f all I need to know is products and this brings in the notion of a colonel so a colonel you can think of either as a function or as a matrix where it gives you the value of products the IJ entry in the matrix will be the product of the ice image times the Jai Simha JH and an example of a kernel is the Gaussian kernel and you notice that the function f doesn't appear in it if I want to know what the product of FI F of AI times F of AJ I simply subtract AI from AJ square it multiply it by a constant and raise e to that power now you might ask the question can I select any kernel I want well not quite you better select a kernel for which there exists a function giving rise to that kernel and it turns out there's a simple test as to whether or not there exists a function if the matrix is positive semidefinite then there will exist a function and you don't need to know that function so this is the notion of what's called as a support vector machine and up until 2012 if you bought a product which had a machine learning algorithm in it it used the support vector machine technology that we just talked about and there exists many kernels other than the Gaussian kernel but it turns out something changed in 2012 and that was the advance of deep learning and I'll say a few words about that it came about due to an image net competition there is a collection of 1.2 million images and they come from a thousand categories and each image is labeled by the category it belongs to and one of the things that's very important about the world today is the amount of data that we have this is one of the drivers of the information revolution so this imminent image net competition the way it worked you would if you wanted to enter the competition you were given fifty thousand images along with their labels and up until 2012 the winner of the competition had an error rate of about 25 percent and it turned out that the improvement were only a fraction they were about at one tenth of a percent and until 2012 when a particular network called Alex net came along and the error rate dropped to fifteen percent and it turns out that this was such a major improvement that people took the technology and applied it in many applications in economics and biology and manufacturing and it seemed to work wherever they applied it although people don't know why it works and that's a fundamental research problem today but then quickly people extended it Alex net I will show you what it is in a minute but it was about five eight levels deep people have now extended the depth up to a thousand levels and they've reduced the error rate to under four percent and the best that humans can do who have been trained is about five percent so computers can now classify images better than humans so Alex net used something which had some levels which were called convolution levels and in a convolution level you had a little three by three window which you slit across the screen one row it's one column at a time and then you drop down one row at a time and and you made another image essentially the same size by having a threshold logic unit getting nine inputs from from that window and what that window supposedly did was found features so if there was a corner it might detect it or it might detect an edge or something of that type and what you would like to do is detect many different images so you'll notice that
10:54
up up here these gates build another image of this size with one feature but you have many of these images and in fact about 62 different features so you can see the size that this network is going to get to be and what you have then to reduce the size of the network is you have another level which is called pooling and here there's a 2x2 window and that to buy when two window is sit over two cells at a time and what you do is you simply take the maximum value now you might say you're losing a little bit about position but it turns out position is not that important if you're trying to recognize a human face you don't need to know the exact distance between two eyes or how much the eyes are above the nose or the nose above the mouth all you need to know is roughly the relationship so the pooling doesn't does it doesn't hurt you so Alex net had actually five levels of convolution and then three fully connected levels and then something called self Max and the way this worked is you put an image in and then you adjusted these weights to get the correct classification and a network has billions of way so you can understand that you're taking the derivative a million derivatives to do gradient descent now what's another researcher did is he changed this slightly instead of classifying images he built a network to recreate the image and at first you might ask how can you do that if you have fewer gates in between than the number of cells in the image but realize let's say there were a hundred gates 100 input gates there's and the value of let's say was either a 0 or 1 input there would be two to the hundred possible inputs but you have only a few hundred thousand images so there's enough storage inside to have a unique code for every image so you can train the network to reproduce the image but what the reason I put this up is it turns out that he started to look at what these gates learned and he discovered that there was one gate that only had a positive output if the image was that of a cat and he never said which images have cats in and which don't and this told us that we could do unsupervised learning and the reason unsupervised learning is important if this is going to be the technology that drives your car as you're driving around heidelberg you would like your car to be learning it can't be trained for every possible situation that is going to encounter but you train it so it'll do a good job and then it will become better and better as it drives ok so I'm going to talk a little bit about something that I'm going to call activation space and what I can do is create a vector where each coordinate of the vector is going to correspond to the put of a gate and I will create a vector for each image and for an image that activation vector is going to represent it now it turns out I'm going to use two types of activation vectors this will be an image activation vector but what I really have is I have a matrix where the rows of the matrix correspond to gates and the columns to images and you can have an activation vector which is a column which is the one I described and I'll call it an image activation vector or a row and a row corresponds to a single neuron and you'd be interested in that if you want to know what that neuron learned so if I have an image and I want to find the corresponding activation vector I simply feed the image into my network and I have the activation vector but what's interesting is if you give me an activation vector what image produced it and one way there's many ways to solve this problem I'm just going to give you a simple one pick a random image see what activation vector it has and then do gradient descent on the pixels of the random image to cause the activation vector to move to the activation vector that you want to recreate and if you do that you will have the image that produce the activation vector and I'm going to now show you some experiments that that people do what I can do is I can take the activation vector at the beginning of the network and call that the content of the image and I can take the activation vector at the far end or actually several of them and take the product with itself and call that the style and then I can take an image say about one of our former presidents and I can ask what would he look like if he was 20 years older so what I do is I take the his content but then I take 200 images of older people and I for the style I use the average of the style of this 200 older people and I create an image of Bush if he was 20 years older it turns out each year I bring 30 students from China over to Cornell University who have just completed their junior year and these students had not heard of learning theory when they came and one of them and each of them has to do a brief research project so one of them took a picture of Cornell and said what would Cornell look like if it was in Asia so it took a piece of Asian art work and that's what Cornell presumably might look like if it was in Asia so I'm going to show you some other experiments that we did here the top row in the middle of the top row is another picture of Cornell and on either side are some pictures that we want to use for style and what we did is we recreated the picture of Cornell in these various styles and the bottom two rows in one case we used a network which had been trained but in the middle case we used just random weights we did no training and you'll notice that with random weights we actually I think did better than with a train Network and this raises the interesting research question which things are we doing require training which takes maybe a month or we could use random weights and the reason that's important is if we could test the ability of networks how good they are by using random weights we could test thousands of structures in an hour rather than one structure in a month and it turns out quite a few things can be done with random weights I thought I would now talk about some research questions one of them is what do individual gates learn and there's the number of questions and one of these students from China who came over took a very simple network he took just ten by ten images which were black and white and the images were letters which could be made up with rectangles and he asked this question how does what a gate learns evolves over time and he looked at the gates as a function of training time and he noticed that three gates started to learn the size of the letter how many cells were black and how many were white but after a little while two of the cells decided it didn't make sense to have three cells learn the same thing and they switched to something else I think this was a fundamental observation and it raises the question why did they change and how did they decide what they should should learn another thing I'll just point out that if you're trained the network to look at photographs let's say outdoor photographs and then you want it to
20:38
train it again for indoor photographs which are fundamentally different it turns out you don't have to retrain the first level because what the first level actually learns is features of photographs independent of what the photographs are of so there's there's a lot of interesting research questions and they're relatively simple ones I'm going to talk a little bit about the training it turns out that there are many local minima some are better than others and we'll explore that and training takes a long time can we speed it up so let me first talk about local minima what I have plotted here is the error function on the training data and you'll notice that there are two local minima one is very broad and one is very sharp and if I have a choice which one should I take and what I want what I haven't told you about is generalization which is one of the most important things in learning theory why do you believe that if you train a network on some training data that it's going to work well on real data and it turns out the theory of that is quite well known and you can look that up but I'm going to suggest that you take the broad minima and the reason for that is if our training data is a good statistical sample of the full data then if I plot the error function for the full data it should be very close to this and the dotted line would be an example and you notice that if that curve just shifts a little or gets modified a little this broad minimum the error function the error is not going to change much but for the sharp minimum it's going to change a lot and so there are questions like which minima should we take important questions but something else when you're doing gradient descent the error function remember if you have 50,000 images the error function has a term 50,000 terms in it and you have a million weights so you're gonna take a million derivatives of 50,000 terms and that's that's what gradient descent would do but researchers have said how about stochastic gradient descent let's change the error function and randomly select one term do one iteration on minimizing that then randomly select another and this could speed up the process by a factor of 50,000 that would change what months to minutes something like that and it didn't doesn't surprise me that it speeds things up but it also finds a better local minimum and why and here's problem possibly the answer suppose I'm doing gradient descent and I start over here and I'm doing full gradient descent I will come down and I will find this local minima whereas this one is much better but notice that if I pick a random image it's likely to shift me over into this region I'll pick another random image and most of the times it'll shift me towards the center sometimes it'll shift me the other way but the vast majority it will go towards the center whoops so what will happen is I will move in here and when I start to oscillate then what I'll do is I'll increase the number of images to maybe 50 else randomly select 50 and that might reduce the variance and then finally when I get down in here I'll do full gradient descent and get to a much better local minimum okay I'm going to talk about things you can learn suppose you have two tasks and you learn them separately and you ask the question what's common to these two tasks you could modify this network like this and train a network for both tasks and what is likely to happen is these gates will learn what is common and I'm just going to go through a number of examples like this quickly to show you things that you can explore I mean one of the things we know is that if you learn two languages when you're five years old there's one place in the brain that will process both languages if you learn to length one when you're five and the other language when you're twenty it's two different places in the brain and you're actually translating back and forth and so this gives you a way to study that another thing is at one time people were trying to do image generation they wanted a computer program they could type in the word cat and the computer would generate an image of a cat and at first you might ask why do that why not just go on the network and find a cat well if you want a composite image if you want an image of a cat sitting on the beach watching the sunset you might not be able to find that image on the network and you'd like to generate it and initially people that weren't able to do solve this problem at first until something called an adversarial Network came along and what they did is they had two networks one was their image generator and the other was a network that they trained to distinguish between the real image in a synthetic image and so they trained their image generator until it can fool the synthetic image generator a discriminator then they trained the discriminator to do a better job and then they went back to the image generator and by working back and forth they got reasonable images but the reason I mentioned this is this is a very important concept and is used in hundreds of applications and one of them is language translation the way people used to do language translation is they would find pairs of text in the two languages and use that to train to learn how to translate but what if you have a language to languages where you don't have any texts what you could do is the following you could just train a discriminator to determine if a sentence is a real sentence or whether it's a synthetically generated one and what they did is they first created they took a sentence say in English and just produced a sequence of German words then they fed it into the discriminator and tried to readjust things so it would be a sentence and then they took the the German sentence and translated it back into English words and then used this the discriminator to make that a sentence and then finally they trained the whole thing to make sure the sentence you got back was the sentence you put in and then the the German sentence in the middle there would be a good translation of the English sentence and this is just to show you how this technology can be used in many problems and I want to just quickly talk about now some simple things when we train a network for a given category we train it with a thousand images but when I my daughter was maybe three years old or so I used to sit on the couch with her with a book called the best word book ever and it has many images and initially I would go through and point at house and say house and then dog and so forth little later I'd go through and point at the images and she would tell me what they were but there's just one image of a fire engine and one day we went out for a walk and there was a fire engine parked on the street doesn't look like the image very that much at all but she pointed to it and said dad fire engine she learned by one image and the question is is there some way it's possible in the first three years of her life she learned how to learn images and then she learned how to just learn from a single image and that's something we have to figure out how to train these networks on a single image something you've probably heard about is
30:23
fooling and this is an image of a cat and what I did is I went in and I changed a few pixels and got another image you probably can't see the difference of those two images and you would say cat right but when we fed it into the deep network that's an automobile there now this this might make you a little bit of nervous because this is the technology which is going to drive you around Heidelberg in a few years and if it sees a stop sign it don't and someone put a little piece of tape on it you don't want it to say green light and it turns out that a minor change can change any image from the proper classification to an arbitrary classification and you might wonder you know why how can that possibly be and you might think how how can you have carve up space into ten categories so that any point in one category is arbitrarily close to a point in any other category now you might say mathematicians will say that's trivial let's talk about two categories one of them the real numbers the rational numbers the other the irrational every rational is arbitrarily close to an irrational and viceversa but I won't allow that because the category what you're the function or whatever has got to be polynomial and how can you do this so so that's an interesting research problem I should repeat we don't know why this deep learning works we know it works and we have to develop the theory as to why every time I give this talk I'm asked is artificial intelligence real and the answer answer in my view is no it turns out that deep learning is simply classifying objects by doing pattern recognition in high dimensional space the networks do not learn the purpose of an object or something like that and I'll come back to that in just a minute
32:58
if you trained a deep network to classify railroad cars as box cars flat cars tank cars engine caboose passenger car and you showed it this image it might very well say that's either a box car or a flat car with something sitting on it but if you look at it carefully you'll notice that there are motors on the wheels and you'll say it's an engine even though there's no cab for someone to sit in it turns out in switch yards they use these engines now they're remotecontrolled there's nobody in them to arrange the cars for a train so I'll just repeat deep learning is pattern recognition and high dimensional space and it deals with the shape of the image and it does not abstract the function or another property of the image so when we talk about AI a lot of things that we think about as artificial intelligence are really just come about because of the computing power for example if learning to play chess when I was a child I thought to play chess required intellectual ability but the the way computer programs play chess they have a game tree the root of the tree is the initial position of the board and for every possible move there's a note underneath showing what the board would be and when you play chess you probably go down three or four levels you don't search the whole breadth of the tree just that's wouldn't be possible but the computer has is faster than you are and so it can go a few more levels and therefore it can probably beat you okay so it turns out where I used to think I used to have a good definition of intelligence I thought intelligence was the ability to solve problems and now I realize that's not a good definition it depends on how you solve those problems so if you're interested more in philosophy it would be good to come up with a definition of what what is intelligence because I can no longer answer questions about when are we going to have real intelligence because I no longer really know what it is but I would like to just close by mentioning something that happened to me when I gave a talk on AI on deep learning I was out out actually in in California and and I was lecturing at a company called Applied Materials and I had talked about deep learning and afterwards somebody asked me a question and they said is there any relationship between deep learning and how the brain learns and that gave me an opportunity to just mention something else which is very related if you're interested in in the notion of learning theory I have talked to many people who talked about early childhood education and I asked each of them well they told me the earlier you do something the bigger the payoff and I asked them well how early and their answer was the first two years of a child's life and that surprised me and so I said where where is the research that supports this now the people I was talking to were in child learning they weren't researchers and they said well we don't know but surely there must be research that supports it so so I went and I tracked down and it turns out in the last 25 years there has been significant research in the area of how the brain develops and there's also been research in the value of early childhood education and and it is apparently true that investing in the first two years is where you'll get the biggest return because when a child is born the neurons are present in the brain but the wiring is very fluid and in the first two years is when the child learns how to learn and if that's done well then the child will do very well in the rest of their life and it turns out the claim of these researchers is that if you provided highquality free child care for the first few years you'd get your money back twenty five years later on to pay for it and what was interesting is after
38:10
the talk someone came up to me and they said our company has we have a nonprofit where we fund things and one of the things we fund is remedial child care for every high school student who needs it in Mountain View California but it's very expensive and what they said to me is would it make sense for us to take some of our resources and fund early childhood and what that would do is reduce the cost twentyfive years later I I just thought I would mention that because when you do research it's also important to get it out to the general population because if they know it can have an impact on changing society and with that I think I'm within the five minutes that I can take at least and I'd like the questions to come from the young researchers not not just our laureates okay there's the first question Victor just wait for the microphone please all right so fascinating world wine tour given your work on automata automata theory all the example you showed in in the machine learning part where about vision but I was wondering in in other domains being able to then map what you've learned back to automata theory could be really powerful I wonder I'm wondering if you thought about that I've thought a little bit about that I should tell you a little bit about my background I actually graduated before computer science existed my degree was in electrical engineering and I was hired at Princeton in an electrical engineering department but what was fortunate for me is the chair of the department understood that the future was going to have computer science and so he said please develop a computer science course for us and what was interesting since there were no courses and no books I had to ask what does one teach in a course in computer science and he gave me a few research papers on automata theory and said if you cover these it'll be a good course the reason I mention it there's there is a lesson here for our young researchers the fact that I this this course made me one of the world's first computer scientists and so 15 years later when our government was looking for a senior computer scientist even though I was only in my 40s I was on the shortlist and the first president Bush asked me if I would be on the National Science Board which funds science research in the US and imagine if I had been in highenergy particle physics I would still be waiting today for the senior faculty ahead of me to retire but because there were no senior people ahead of me I had fantastic opportunities you would normally wouldn't have now when I tell this the story to students that I teach they say well you were just lucky because you were at the right time but let me tell you that you're also at the right time because there's a fundamental revolution going on in computer science and in the world during my career we were teaching and developed doing research on making computers useful so we were interested in algorithms compilers databases and so forth but computers today are useful and now we're switching to what are they being used for and so I think one of the things is you're likely to interact with people in other domains the size of problems is going to be much bigger and and there's really a chance for you to position yourself as those of you that are computer scientists in a fundamentally new area but also those of you in mathematics all of a sudden mathematics a certain part of mathematics has become very important we sort of refer to it as applied mathematics because now all of these applications that we can do that we couldn't do before because we didn't have the computing power are developing new types of mathematics to solve them but I worry about that otamatone theory course because when I teach in China they asked me if I would teach that course and we're phasing it out of US universities and I was a little nervous about creating it in China but I decided maybe it made sense because it that expertise hadn't been developed in China yet in the US if you want a compiler written there's thousands of people who can do it for you it's not clear in China that they have that kind of expertise time
43:26
for one more question and the short answer so yeah over there please just good hi thank you so much for your time it was amazing listening to you I have a question regarding interpretability of deep learning models what are your thoughts on how do we build actionable insight from the deep learning models which itself theoretically is little understood right now and especially with specifically a first application and medical domain and domains like genomics and stuff III I missed part of it is it how do we train this in like medical domains and how do we what it all interpretability oh we we actually so if the question is how do we figure out what it's doing I don't think we know how to do that yet and this is an important research area because if computers if say deep learning is start going to start making decisions people are going to want to know how is it what what decision is it making and is it biased because biases can creep in in in your training data and other other ways and so it's fundamental to figure out how they're doing it and so forth [Applause] [Music]
45:11
you