Transfer Learning and Collaborative Artificial Intelligence (AI)
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 6 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/43731 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Time domainProcess (computing)Körper <Algebra>Arithmetic meanDependent and independent variablesHeat transferMusical ensemblePerspective (visual)Model theoryReal numberPoint (geometry)PredictabilityRight angle2 (number)SubgroupTransverse waveCategory of beingModulformMultiplication signCartesian coordinate systemEnergy levelFree groupPhase transitionStatistical hypothesis testingSet theoryLecture/Conference
07:31
Heat transferRandom numberDifferent (Kate Ryan album)ModulformMathematicsWave packetHeat transferInitial value problemCoefficient of determinationGroup actionGoodness of fitWeightSocial classRegular graphProcess (computing)1 (number)Time domainDampingPoint (geometry)Parameter (computer programming)Algebraic structureFunctional (mathematics)HierarchySet theoryInsertion lossLatent heatPlane (geometry)Presentation of a groupGroup representationPhysical lawConnected spaceSampling (statistics)Lecture/Conference
15:02
Maxima and minimaSet theoryHeat transferMultiplication signVector potentialGroup actionKörper <Algebra>TheoremCartesian coordinate systemWave packetProcess (computing)Function (mathematics)Coordinate systemWeightHeat transferProof theoryResultantStochasticValidity (statistics)Initial value problemStrategy gamePoint (geometry)Network topologyStatistical hypothesis testingRoundness (object)MereologyImage resolutionEnergy levelGradient descentDifferent (Kate Ryan album)General relativityExtension (kinesiology)Theory of relativityNatural numberSet theorySequenceScaling (geometry)Physical systemGradientDirection (geometry)Maxima and minimaSocial classDampingFunctional (mathematics)Price indexElement (mathematics)Latent heatPosition operatorComputer programmingDynamical systemFree groupSign (mathematics)SpacetimeInvariant (mathematics)Module (mathematics)Negative numberPresentation of a groupVector spaceMoment (mathematics)TrailGroup representationForcing (mathematics)Dimensional analysisMeasurementReal numberObject (grammar)Maß <Mathematik>Connected spaceModel theoryDiameterCategory of beingTerm (mathematics)Covering spaceNoise (electronics)Right angleRandomizationTheoryMassVolume (thermodynamics)AutomorphismFlow separationPower (physics)Insertion lossGame theoryGoogolPredictabilityProjective planeNumerical analysisVibrationArithmetic progressionAverageMathematicsMusical ensembleEstimatorStatisticsClassical physicsParameter (computer programming)Mathematical optimizationConvolutionHypothesis2 (number)Sampling (statistics)Metric systemLecture/Conference
Transcript: English(auto-generated)
00:15
Thank you very much, everyone, for being here today. Thank you, Mervyn, for the invitation.
00:23
I'm going to talk to you today about transfer learning. This is something that we all experience in our everyday life. Suppose you learn a new language, then you will find out that it's easier to learn another language after that.
00:43
It's been shown also that if you teach music to young kids as early as like nine months, they will be able to learn a language faster after. So this is this idea that we all experience that
01:03
when we learn something in one domain, then we can learn better and faster other things in other domains. But before talking about transfer learning, I would like to speak a little bit about Okken for those of you who don't know what it is. And then
01:23
I will explain you how transfer learning can be applied to medicine and how it can be maybe a solution for very big problems. So who we are at Okken and what we do.
01:43
So Okken is a new startup company that was founded last year. And we have offices in New York and here in Paris. And I'm the chief scientist and I lead here in Paris the
02:01
scientific team. Thomas Clozell is my partner in this startup. He's an oncologist, so he's medical doctor. And he used to work in the field of predictive medicine. He developed techniques to predict response to chemotherapy using epigenomic data. And I used to be
02:28
from academia since I was working in ENS for the last two, three years. And I've been working a lot on applications of machine learning in the field of medicine. Right now we are a team of
02:45
about 10 people and we have exceptional machine learning engineers and top data scientists that performed among the best in the international competitions such as Kaggle.
03:04
And what we do is that we develop algorithms and softwares in collaborations with biotech companies, pharmaceutical companies, hospitals and academic labs. So I will show you the four major topics that we are covering in the company. So what we do is that we
03:24
develop these machine learning techniques to address fundamental problems in the drug discovery and development process. And in this process there are several steps that are very time consuming and that cost a lot. If you want to develop a drug today,
03:44
it will cost you around one billion dollars to perform the drug design, drug development, clinical trial, all the process from the idea to the market. And it fails a lot. It fails more than one out of two, more than 50% of the case. And to achieve, to transform
04:09
this process, this workflow end to end using artificial intelligence, we focus our development on drug design, how can we use machine learning algorithms to predict
04:22
molecular activity at the chemical level. We work on clinical trials, now that you have a molecule, you want to test it in the population, how can you use the data you have accumulated until the phase three, for instance, to select the right subgroup of people that will
04:43
respond to the drug. And this is also, there is a clear need for predictive models. We work on real world evidence, so meaning that we work, for instance, with hospitals, with data coming from general practitioners. For instance, we work with Institute Curie
05:01
to work on clinical data in a free text form, meaning that this data is written by the doctors and it's a huge, huge amount of data, millions of documents. And we also work on images. This is a little bit transversal because it can be used for clinical trials, for real world evidence. And this is maybe one of the field right now,
05:24
medical images, where deep learning can have the more impact in the next few months or few years and a lot of people are trying to do that. So now I'm going to talk about transfer learning,
05:43
give you some ideas about what is transfer learning, how can it be applied in medicine. And then after that, we'll take some time to discuss about more general ideas about collaborative artificial intelligence and the data sharing problem.
06:04
So just before starting, in medicine, data is often has very specific properties. The first one is that it's something that should be kept private.
06:26
Medical data is something that is not to be shared to everyone, one has to be very careful. Second point, it's often coming in small data. It's really not common to go and see a doctor
06:45
or a lab and he says have data for 1 million patients, never happens. Often you have data sets of like say a few hundred images. So these two specific features are really important to bear in mind. And I will show you
07:06
why these two key features put in perspective this idea of transfer learning as really something that can be very useful. So first, what is transfer learning? So this is a domain of artificial intelligence and machine learning,
07:25
which is focused on disability that an algorithm can improve its learning capacities. Either it can learn faster or it can learn better to achieve better accuracy on a given data set through
07:43
previous exposure to a different data set. So you have the task that you want to perform on your data and then you have this other data that can be helpful to solve your task. And you can use transfer learning so that your algorithm can learn other stuff on this other
08:07
data sets. They can be different, it can be different types of problem, it can be a different task, it can be anything, but there are some things that can be extracted. One point that is very important also is that deep learning architectures
08:26
are very well suited for the transfer learning approach. And the reason is that these deep learning architectures, they operate by different layers in a hierarchical way.
08:41
Okay. And often what we realize is that the first layers can learn some things that are very general on the data sets, very basic, such as discovering contours or basic structures, basic texture of a cell or any medical images, and that can be transferred then to another
09:07
problem. And I think this precise point is one of the key hidden reason behind the success of deep learning. As I mentioned in the early introduction, it's also a cognitive phenomenon
09:24
in humans that learning things in one domain can help you learn things faster and better in other domains. And personally, we will discuss about mathematics later. Personally, I think that mathematics is really a good example of that, that people who are educated in mathematics
09:47
can learn faster many other things after that. But I'm speaking in the home of mathematics, so I'm safe here to say that, but maybe not everyone will agree. Anyway, so it's very interesting that we are entering this era of artificial intelligence,
10:08
where we can have this connection between the cognitive phenomenon that seems very powerful and very abstract, this idea that if I learn mathematics, I can be good at law or philosophy.
10:24
This very abstract concept is now really present today in everyday life of people doing machine learning. So this is an idea. I think it's really exciting. So I just want to give you
10:47
two examples of how can it work in general for deep learning, and then we go into two, three specific examples in medicine. So very generally, what people do every day in the deep learning
11:05
community is something like that. It's called warm restart. So what they do is that they want to solve a problem. They want to be able to classify cancer versus non-cancer from medical
11:21
images, for instance. So they have a bunch of, say, 100 image of each class. If using 100 image of each class, you want to train a deep learning algorithm from scratch using completely random initial condition of your neural network weights, it will most likely be a failure,
11:45
because you have too many parameters to train, and you have not enough samples. So what you can do is say, well, I can't do anything. Please come back to me with more data. This is something you can tell to the doctor that gives you the data. But what you can do
12:03
as well is something that may seem completely crazy. You say you start from your random neural network, your initial condition, then you train it on a very large database called ImageNet that most of you know, I assume. It's a database with a million images, 10,000 classes that shows
12:27
plane, cars, cats, dogs, whatever. And the task here is to classify all these pictures. So you train your neural network on this database. It gives you what we call a pre-trained
12:43
neural network at the end of training. This neural network is now able, if you give him a picture of a lion or a car, to tell you if this is a lion or a car. So it has learned things. It has learned how to see things, but not especially medical images. So why are we doing
13:07
that? Does it have any chance to work? Then once you have this pre-trained neural network, then you can use it as an initial condition to train your deep learning algorithm on the data
13:26
that you are interested in. And basically, you do the same as if you were starting from an initial condition that is random, but you just change the initial condition. The training process is all governed by the same back propagation algorithm that you like. But you just
13:44
change the initial condition. And if you do that with just a few hundred images, you will see that the improvement in performance is really astonishing. And I will show you some examples after. We can also do other things. There are many,
14:03
many other possibilities. I want just to show you another one that is quite interesting. The idea of shared representation is the idea that you can train your network to classify planes and cars and lions. Then it gives you this pre-trained network. Then you start from
14:25
an initial condition that is random on your medical images. But what you do is that you penalize your loss function. You had a regularization that tells you that your new network should look like the one that was trained on ImageNet, or that the features, the activation
14:48
of the neurons should look like the ones that are on ImageNet. So it's a process of regularization that has the same kind of effect that the worm restarts, but sometimes it can work better. So let's go into some concrete examples now.
15:06
So there is always in teaching this debate, should I start with examples or with the theory? So I decided to start with the general theory and then I show the examples. We'll see if it works. So here is the cover of nature February this year. So it's like two months old,
15:29
already very old. And the cover of nature, it shows a paper that has really shaken the world of dermatologists. So we have a lot of dermatologists that come and see
15:45
Hawking right now to see how we can help them. And basically these people in this article, they applied deep learning techniques to classify skin cancer using pictures. And what they show in the article is that they obtain the same level of performance
16:08
in terms of classification accuracy than a group of experts, dermatologists. So in 2017, this is where we are right now. These kind of systems, if they are trained on a database that
16:25
is large enough, they are able to perform as good as doctors. And there are other papers that we can discuss, I can give you the reference, that shows that in other problems, they can even perform better. And here, this picture is just an excerpt from their article in
16:46
nature. And what they show right in the middle here is called deep convolutional neural network Inception V3. What is Inception V3? Inception V3 is the name that Google gave to the neural
17:05
network they trained on ImageNet, the same pre-trained neural network I was talking about. So you have to know that these pre-trained neural networks on ImageNet, there exists a zoo of models. There exists many models. Google has Inception V3. Maybe,
17:25
I don't know, maybe there is a new one that has been discovered. You have Microsoft that has another network. You have Facebook. You have also academic labs, etc. Each one has pre-trained neural network on ImageNet. Then what these guys did in nature, they took that
17:45
as a starting point and then they do the warm restart. So they keep the architecture and they just do the gradient descent on the weights. And the real tour de force in this article is the size of the database. I told you that it can work with a few hundred images. That's true.
18:04
But if you want to achieve on this very hard problem, the same kind of performance as expert dermatologist, then you need around 100,000 images. And this is not easy to gather. So this is really very impressive.
18:23
So this is a very concrete example. And right now all the dermatologist community is really trying to understand how things are going to organize around this kind of technologies. I want just to give you two more examples of things that we do in the company.
18:42
So this is a work that we've been doing with a biotech company that is developing new drugs on a disease called pulmonary fibrosis. It's a very bad disease that can kill people in a few months and there is no real treatment at the moment.
19:00
And their goal was to develop a system that can grade fibrosis so that you have low grade. It means that the fibrosis is not spread and it's not very important. And high grade means that it's very bad. It's very severe. And when they develop new molecules,
19:22
they want to test, if I give this molecule, will the grade be lower after a few weeks? So they do clinical trials, preclinical trials, and they want to assess the grading. And usually they do that with a team of expert pathologists that look at the image.
19:42
So these images are taken from biopsies of the lung and they are scanned at very high resolution. So here, there is a challenge here on the side is that images are gigapixel images.
20:01
So meaning that you have 100,000 per 100,000 pixels. So it's very big images. And we were given only around 100 images with five different classes. Okay, so you have like 20 examples per classes and images are more than one gigapixel.
20:24
And basically, we are not still at the level of the agreement of pathologists. But what we realize is that if we do the neural network training from scratch,
20:43
okay, then it performs around 65% accuracy in the five-class program. But if we use a neural network that was pre-trained on this carplane, lion, and et cetera task, then it performs way better, okay?
21:02
So this is a very precise, concrete example of how transfer learning can give a very important improvement of the results due to this problem of having only a few data points, okay?
21:20
And this is a very common problem in medical problems. The second example, okay, the last example I wanted to discuss here is a project that we've been doing in the last few months. It's a challenge organized by the Kaggle platform.
21:41
So Kaggle is a platform for data science competitions that gathers all the best machine learning and data science engineers and teams over the world. So it has more than 100,000 users and the platform has been acquired by Google recently.
22:02
And on this platform, there was a challenge with a price of $1 million. And this challenge is the one that we try to play. And it's called early detection of lung cancer from CT scans. So what is the game?
22:20
You are given 1,500 CT scans. And for each patient that you have, you have just a label that tells you if this patient has or will develop a cancer within 12 months time frame, starting from the exam.
22:43
So what is a CT scan? A CT scan is a 3D representation of what's inside your body. You have like 200 slices and each slice is around 500 per 500 pixel. And it looks like that.
23:01
And you can reconstruct in three dimensions, if you want, what's inside the lung. This is very challenging because we have only around 1,000 data points. And the signal is really, really small compared to the volume of data we have.
23:25
Because the signal is hidden in the idea that, okay, this guy has a nodule. So it's a mass in the lung that is very important to diagnose this type of lung cancer. And these nodules is just a few centimeters.
23:42
So it's just a few dozens of pixels patch 3D. So you have like a very, very small, tiny needle in the haystack. And you have a lot of information around about the lung that is completely useless and that is completely noise for your machine learning algorithm.
24:03
So we call it a very strong, weak labeling problem. And in this problem, everything that we tried, just using the data given in the competition completely failed. And the guys in the team are very strong and they tried a lot of things.
24:24
And I think it's almost impossible to solve this challenge with only this data. So we had to find another way. We had to find a way to bring knowledge from outside because it was really not possible to design something that would work.
24:45
So the final ranking is tense. So we didn't win the $1 million, but we still earn, we still win a small price. But we, most importantly, we learn about the strategy to solve this kind of problems.
25:01
So I put aside the fact that it's a 3D problem, okay, and 3D convolutional neural networks are not as easy to manipulate than two-dimensional, but I would say it's a side problem. The real idea was to bring external knowledge through another data set.
25:21
So we use this idea of transfer learning very, very strongly in this case, and it was the only way for us to get something that worked. So there is this other data set called Luna that's been designed for another task with other patients in another challenge the year before.
25:42
And the other task was to work on segmentation of nodules on CT scans. It was not about early detection. So it was something with different labels, different patients. It was a different data set. So what we did is that we trained a neural network to distinguish the nodules.
26:04
So we split, we extract patches in three dimensions, okay, and each patch we have this annotation that tells you here you have a nodule, and we can give you this diameter, or we can tell you here there is no nodule,
26:23
and this is the kind of label that we have from the Luna data set. So this is an information that is localized in space, okay. Compared to the annotation on the Kaggle data, which is completely not localized.
26:41
They just tell us if this patient has a concern or not, but they don't tell you where is the nodule, et cetera. So there is pretty much nothing you can really learn. So you train first this three-dimensional convolutional neural network on this Luna data set. So it has five convolutional layers with pooling,
27:07
so that at the end with 64 channels, so that at the end you flatten everything, and you have just 512 vector, okay. And from this vector of dimension 512, it means that you can have a representation
27:26
of a given patch of 64 pixels of size, okay. And now what you do is that you have trained these networks, you have the weights, it's here. You take your data from the Kaggle data set, you extract patches,
27:43
so you do it with overlap, but you just scan everything on the patient, all the patches. Then you put it in the CNN, you extract your vector of representation, and then you do some max pooling, so that at the end you erase all the spatial information,
28:03
so everything becomes invariant in space. And from that, you can do classification with these new features. We could also have done some global fine-tuning of the whole process, but still it was very complex to make it work. And at the end, we do gradient tree boosting to do the classification, okay.
28:25
And this strategy, we realized that among the top 10, almost everyone has done something similar, okay. So it was really, maybe not the only way, but I really challenge everyone here in the room
28:42
to achieve a very significant result without having this transfer learning approach. So here we see there are some very important problems, very difficult with a one million dollar price, and the only way to find a solution that starts to work is based on transfer learning.
29:05
So this is not an anecdotal technique. This is really at the heart of many, many applications of deep learning. So now that I gave you this compelling story about the use of transfer learning in the field of medicine,
29:23
I would like to maybe take the time I have here to dream a little and to try to move one layer of abstraction above and think about what this kind of technologies could make possible in the future.
29:47
Right now, if you think about classical statistics, for instance, each time you have a problem of hypothesis testing, of estimation of parameters or whatever, you start from a white page, okay.
30:02
Each time you have a new data set, it's a new problem and you start again. With transfer learning, the idea is that you can build up systems that learn from a given data set, then they are able to perform a task, and then you can reuse them to do another task and to learn other things.
30:25
And you can climb like that and you can build systems that can become better and better when they are exposed to different tasks. It's like humans, it's like this sentence like standing on the shoulder of giants.
30:41
This idea that artificial intelligence systems are entering a mirror where they are not isolated anymore. These algorithms can talk to each other through transfer learning.
31:01
Okay, so why is it so important? So clearly, what we have shown here is that transfer learning is one of the key reasons behind the success of deep learning, and it brings the power of machine learning to small data sets that will not be amenable
31:24
to classical machine learning solutions. But as I said, it opens the way to collaborative artificial intelligence. And I would like to show you how here at Okken, we imagine the future.
31:41
So our idea is to build a collaborative artificial intelligence platform, where each contributor would be a lab, a hospital, a doctor, whoever has some data. That is labeled. Suppose you are a doctor, you have 500 images of breast cancer that respond to chemotherapy,
32:05
500 images of breast cancer that do not respond well to chemotherapy. You have this data. This data is very important. It needs to stay private most of the time. And maybe it's not enough data to build a powerful algorithm.
32:21
But you know that other people around the world, they have this kind of data. So the idea is that as a contributor, you can create new algorithms, or you can participate in the creation of new algorithms. And these algorithms can improve each other through transfer learning,
32:43
going from one center to the other. And then the community of users, the people, the doctors that want to use this algorithm to make predictions and say, well, I need to predict if this patient will respond well to chemotherapy or not, can use the algorithms that are built on the platform.
33:03
And so the idea is that you don't need really anymore to have just a team of someone who has the data and someone who designed the algorithm and they work together. Okay, this is something that is very important. But we are thinking for the future and we think that this can become a platform system
33:26
and we can move toward something that is more collaborative and with the idea that people can keep their data. So we have a prototype of that concept that is working, that we are currently installing in several hospitals in Paris, in Lyon, in New York,
33:47
where each user who has medical image can create algorithms just without any coding skills. It can apply the algorithm to new images that come in the clinic
34:01
and share the algorithms that have been trained to other people on the platform. But I think really here the final point I want to make in this conference is maybe the most important one. We are doing all these nice predictive models,
34:25
but clearly what is really lacking today and for the future in the medical community and in artificial intelligence for medicine is the motion, it's a dynamic toward an actual data sharing solution.
34:47
Right now the medical data is really spread across many different centers, many different hospitals, many different pharmaceutical companies, etc.
35:01
All this data is really not at the moment shared in a common place. And this is a huge problem because if we want to be able to train powerful artificial intelligence algorithms that can help doctors, that can help patients, that can help predicting the effect of a treatment to give the right drug to the right patient.
35:25
If we want to do that, the first, the most major obstacle is this data sharing problem. Okay? And we hear every year at very nice conferences, very nice people saying,
35:42
well, we are committed to data sharing, we are going to create this platform where everyone can put this data together, etc. If you say that, people think you are someone good, so it's good to say that. But in reality, it's not happening. It's not happening right now.
36:01
And I think for many, many reasons, privacy, intellectual property, different things like that, it's not going to happen soon at a large scale. Okay? So if we want to do something powerful that exploits all this data that is already here,
36:22
we need to find another solution. And I think that these ideas around transfer learning can be very, very helpful. And in fact, it's very simple, the idea behind,
36:40
because transfer learning is just you train your network on a given data set, you use it as an initial condition for the next training. So it's just about doing a gradient descent and following the path. And just you stop at one point and you start again. So the standard approach would be to gather all the data, as I said, in one place.
37:04
So you go, you knock on hospital A door, you say, can you give me your data, I will put it in a secure place. You go to hospital B, to hospital C, or you can do that, you can imagine doing that with general practitioners. So it's not a few hundreds hospitals, it's a few hundreds of thousands of doctors.
37:24
Or you can imagine that this problem can arise with Internet of Things and connected objects that measure our health. And we don't want to share our data with some cloud provider or whatever,
37:42
but we would like to benefit from the artificial intelligence that this system could offer. So the scale of this problem is not only a few hundred hospitals, it can be millions of users. Okay, so what can we do to overcome this and say either this is not feasible,
38:02
or we don't want to do that. We think it's too risky to put all this data in the same hands. So the idea is really simple. It's a kind of natural extension of the idea of transfer learning. Is the idea that you train your neural network on hospital A,
38:24
so you have your 100 or 1000 image with a given task here. And hospital B has the same kind of images with the same kind of labels. So now that you have trained your network, you can make it travel to hospital B.
38:40
So the data stays put in hospital A. But only the neural network waits, travels. So it's brain traveling. You make the algorithm travel. And then you put the algorithm in hospital B, and you continue the training.
39:01
So this is, I would say, so you can do that for all your users in your network, and you do one round, and you can do another round, et cetera, until your validation accuracy is good enough. What you can do as well is doing something that will exploit
39:24
all the computing power that can be put in each hospital by doing a parallel algorithm. So you can do like a parallel stochastic gradient descent, so that suppose you are the coordinator of the process. You first, you send the same initial network to all the hospital at the same time.
39:48
Each one does a training on his data set. And then what they give you back is the gradient, the change of the weights. Then what you do is an average of this change.
40:04
So in the gradient descent, you average the contribution of the contributor who has data. And then you send back these new weights to everyone, to synchronize everyone. And you do another batch like that, another epoch.
40:21
So these kind of algorithms, it's called parallel stochastic gradient descent. And there has been recent research the last two years about how to improve that, how to make it work better, faster, especially in the context where the network that we are making travel,
40:41
they can be 100 megabytes, maybe one gigabyte soon. So these are objects that you should make travel not too often. So if you want to make it travel not too often, then you should try to optimize everything so that it works well.
41:03
It can be also, you can use that in a peer-to-peer fashion. If you don't want any third party involved, you can make the neural network travel between users. And I think that this idea is very powerful for the future of artificial intelligence
41:25
in medicine on a wide scale, not only hospitals, but as I said also for data coming from doctors or for consumers. And it's going well beyond medicine as well,
41:40
because it's a paradigm to be able to train artificial intelligence system without jeopardizing the privacy of data. And I think as citizens here in the room, we are all very concerned about that right now at the moment.
42:02
And hopefully there are technological solutions that can be developed, that can have a huge potential for medicine, but also for the privacy of every citizen. So I guess it's time for me to conclude.
42:20
I think my message was clear today. So with transfer learning technologies, all these artificial intelligence algorithms can cross-fertilize. This idea of algorithm sharing has a potential to circumvent the data sharing problem,
42:42
okay, which is to me the major problem right now for the emergence of a lot of new artificial intelligence applications in medicine. And we hope that developing these technologies can have a big impact
43:01
because we work every day with doctors. And what we think first is how can we make it happen today. Okay, thank you very much. Is there questions?
43:22
Yeah, so my question is when you use data to pre-train and then train for something else, do you know how much the original knowledge is preserved after you did the experiments? Yeah, so you can destroy some knowledge you've learned, okay, but you can keep in the memory as well.
43:41
So you can keep both. But it can destroy knowledge. It doesn't work all the time, of course, and it still remains unknown from, I would say, from a mathematical standpoint how different the two data sets should be so that it works well.
44:01
Because if the two data sets are too close, then it's just a matter of adding some new points, and it's not very useful, okay. So you need some kind of difference, but not too much, okay. And it's like, I would say, it's like if you learn languages that are quite different,
44:24
if you learn Turkish and Finnish, maybe it will be easier for you to learn Russian and Egyptian or whatever, okay. But if you learn only Spanish, Italian and French, then you are not in a good position, I think, to learn new languages.
44:44
In the CT thing, why didn't you fine-tune? Because, I mean, the transfer learning thing makes sense if you fine-tune, since it's different data. Yeah, so we just can take random features and... No, no, no, no, we use the features that were extracted from a network that was trained on the other.
45:02
We didn't fine-tune, but already feature extraction plus the gradient boosting worked well, and you may see the gradient boosting as just doing a fine-tuning of the last layer, if you want. Okay, if it's a linear stuff. The last layer is not fine-tuning,
45:20
but the last layer is learning a classifier. It's learning the classifier. And you cannot, I'm sorry, I'm being a pain, but you cannot really call it transfer learning if you're only fine-tuning. You are just taking pre-existing features and you use them and they work pretty well, which is fine. If you want, I mean, we call it as you want.
45:42
I mean, to me it's still transfer learning. My question is about privacy. So you have this distributed scheme whereby, as you say, you don't need to actually share the data
46:02
that you use for the training. So this is preserving privacy. So my question is, I assume that you only have access to the training network after and before some training. What can you infer on the data that has been useful? Okay, that's really an excellent question.
46:21
And this is a question we are currently working on in the team. My intuition is that if you only have the weights of the network, not the activations that occur when you present the input, just the weights, I would say it's very, very difficult to recover any data points.
46:45
Maybe you don't want to recover the full data, but you might be able to infer like how many patients were affected with this and that disease in a given data set. That might be already too much information. Yeah, of course. So that's an excellent question and we will be really happy
47:04
to collaborate with anyone here. To work on the theoretical proof of that question about what can we infer knowing the weights about the data that was used for training. Maybe there exists some very nice theorem already that I don't know.
47:22
Okay, I don't know everything. But I think this question has not been very studied before. But it's a very good question. For the CT example, if I understand well what you are using at the end, it's just the diameter of the nodule?
47:43
So the diameter of the nodule is used as the target for the training here. So we split it into several categories like small, medium, large. And we have another category which is not a nodule. Because in the database of Luna, you have some...
48:02
No, my question is the outcome of your deep neural net on the competition was a vector where you have the number of diameters of... Yeah. And that's what you used to predict? No, we used the layer right before. Okay. And you said that you were 10th, right?
48:24
Yeah. On the competition? The best... What was his performance? Yes, are we detecting 80%? Okay, so it was a log loss metric. The data set was relatively balanced, I guess. But anyway, it was like 0.4 of log loss.
48:44
Which means in our cross-validation, it was like 0.83 AUC. Okay, so you're at 83%? At 83% AUC, AUC metric. And this is useful? Yeah, so this is...
49:01
When you fail, okay, so you... So this is meant to be used to do a general screening of the population. And they want to reduce the number of false positives and false negatives. And this is a task that is really not easy for the doctors. I mean, so how would it be done online?
49:21
And usually, the problem is that for some of them, you don't have any nodules. Yeah. So we had examples in the database where these people were classified as having cancer, or will develop cancer in the next 12 months. There is a subtlety without nodules, but you have other markers like emphysema, and we didn't use that.
49:44
You were based on nodules. Whatever was it, so we tried to predict whether... We were only on nodules and we ran out of time, but it was our to-do list to integrate other things like that. And why you couldn't train a whole network in the very beginning, or just the... Yeah, yeah, we tried to do a lot of things on just the data set.
50:04
And you can try it. You will see. Yeah, you have two pure samples, and the information is lost in the middle of a lot of other things that are completely not related to the disease. So... You have more samples on the Kaggle dataset than on the...
50:22
But the annotation in the Kaggle dataset is global. It just tells you if the patient has the concern or not. And on the other dataset, you have a local information. You can take the last question. Two questions, last question.
50:41
The first one is related with the security one. So in this process, do you share your algorithm? So in which process? In the, for example, if you have isolated data centers. Yeah. And if you want them to provide you the output, do you share your algorithm?
51:01
So we share the algorithm between the different users that are taking part in solving the problem. But then the question is whether we share the resulting algorithms to the global community in the world. It's something that can be done sometimes and other times is not done,
51:22
depending on the different parties at stake and different problems. So of course, it would be better to share the resulting algorithm to everyone. But then if there are a lot of money that is invested on that and that people want to keep for them, then you enter, you have this problem that is all the time like that.
51:44
I think if you share the algorithms, then as if I'm a data center, as if I have the control of the, how to say, several of the dynamic of the algorithm, because I can provide some specific data such that I want some specific outcome.
52:07
This is some kind of security problem that if I'm a malicious player, I can provide some specific data such that I can have some control of the algorithm outcome.
52:22
So you mean a malicious player that will disguise as a hospital and take control of the... This could happen maybe in... Okay, that's a very clever question.
52:41
But we are much on the ground right now, and we talk with people on the phone and we say, okay, did you receive the network? Commercial system and this could happen. Yeah, okay, that's in the same direction of the other question like that, is how this kind of system is really secure.
53:01
Yeah, okay, that's very, very interesting. And second question is, normally we know that individual optimization or learning may not, a sequence of individual learning may not lead to social optimal. So how can you make sure that during this dynamic process,
53:20
you can, how to say, increasingly improve at the mathematical level, you need at least some proof or some... Yeah, so I mean, if you take, say, the MNIST data set with a handwritten recognition problem,
53:40
if you split the data set into 10 different parts and you put one part in each hospital, okay, first I will call you crazy, why do you say that? But then you can train and you put back the weights and you send it back and you train, et cetera, and you will see it will converge almost as if you had all the data in the same place.
54:03
So right now it's more, I would say it's a bunch of experimental results, but also the process really mathematically is really like doing the gradient descent. And there are also some... You have some potential effect that when you're optimizing some distributed problem,
54:27
then you are converging to the potential of the problem. Yeah, so there are connections, and there is a lot of theory existing already on parallel stochastic gradient descent. But okay, I think that if we go into the specifics,
54:45
maybe there are some different things that should be proven to be sure mathematically, but still in our first experiments, things are quite working in the right direction.
55:01
We should stop here and close the...
Recommendations
Series of 3 media
Series of 10 media