Supercharge your Deep Learning algorithms with optimized software
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 118 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/44795 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
EuroPython 201976 / 118
2
7
8
15
20
30
33
36
39
40
45
49
51
55
57
58
63
64
66
68
69
71
74
77
78
80
82
96
98
105
107
108
110
113
115
00:00
AlgorithmSoftwareGoogolPoint cloudIntelProduct (business)Software developerObservational studyLink (knot theory)Information technology consultingSoftware developerCore dumpMachine learningSoftwareLecture/ConferenceComputer animation
00:30
SoftwareCASE <Informatik>Observational studyReal numberLecture/Conference
00:53
IntelSoftwareComputerComputational intelligenceStatement (computer science)AlgorithmSoftware frameworkBitStatisticsPersonal digital assistantSurgeryTouchscreenSequenceVolume (thermodynamics)SoftwareObservational studyFamilyDisk read-and-write headStatisticsTerm (mathematics)Cellular automatonNumerical analysisAreaClosed setType theoryGroup actionMultiplicationSurgeryProgram slicingSimilarity (geometry)Order (biology)Social classCASE <Informatik>Expected valueMultiplication signField (computer science)Online helpPort scannerReal numberComputer animation
04:21
Order (biology)Volume (thermodynamics)Program slicingTwitterLecture/Conference
04:38
Expert systemDigital rights managementOnline helpProcess (computing)Numerical analysisObservational studyMultiplication signField (computer science)Process (computing)Computational intelligenceMathematical analysisLink (knot theory)Computer animation
05:38
Fisher informationNumerical analysisInformationPerspective (visual)Multiplication signSet theoryPower (physics)Artificial neural networkOrder (biology)Lecture/ConferenceComputer animation
06:21
VoxelComputer-generated imageryArtificial neural networkSet theoryOrder (biology)Open sourceLecture/ConferenceComputer animation
06:40
VoxelComputer-generated imageryVoxelVolume (thermodynamics)Figurate numberMachine learningPixelType theoryComputer animation
06:57
VoxelComputer-generated imageryAlgorithmPredictionoutputRight anglePixelType theoryCombinational logicSet theoryoutputVirtual machineOrder (biology)RAIDCellular automatonAlgorithmArtificial neural networkAreaSlide ruleComputer animation
08:02
AlgorithmPredictionoutputPredictabilityArtificial neural networkDegree (graph theory)Computer animation
08:19
AlgorithmoutputPredictionCodierung <Programmierung>WeightArtificial neural networkFisher informationComputer-generated imageryPixelAlgorithmAreaDegree (graph theory)Endliche ModelltheorieExecution unitMedical imagingComputer animation
08:43
PixelWeightArtificial neural networkAlgorithmComputer-generated imageryVoxelCodierung <Programmierung>outputSoftware testingView (database)Data modelTensorIntelDistribution (mathematics)SoftwareStack (abstract data type)Software frameworkConvolutionWave packetDistribution (mathematics)Software frameworkVolumeLevel (video gaming)Artificial neural networkoutputSocial classWordProcess (computing)CircleOrder (biology)MultiplicationPixelSoftwareResultantVirtual machineDisk read-and-write headDot productVoxelEndliche ModelltheorieSinc functionMessage passingChaos (cosmogony)AlgorithmTensorExecution unitDataflowSubsetBlack boxSet theoryComputer animation
11:41
WindowArtificial neural networkComputational intelligenceOpen setNumerical analysisoutputMultiplication signWave packetMathematical optimizationSet theoryResultantProcess (computing)Right anglePolygon meshVirtual machineLecture/Conference
12:43
Vertex (graph theory)IntelConfiguration spaceVirtual machineWave packetArtificial neural networkMultiplication signPoint (geometry)Mathematical optimizationSoftwareComputer animationDiagram
13:20
SoftwareWave packetVirtual machineMereologyLecture/Conference
13:48
Vertex (graph theory)IntelConfiguration spaceNumerical analysisVirtual machineMessage passingSoftwareComputer animationDiagram
14:11
MultiplicationVirtual machineSoftwareLecture/ConferenceMeeting/Interview
14:46
AlgorithmDistribution (mathematics)Kernel (computing)Library (computing)MathematicsSharewareArtificial neural networkSoftwareStack (abstract data type)Software frameworkClefTensorCoprocessorComputational intelligenceProgram slicingMathematicsLibrary (computing)DataflowOrder (biology)Wave packetExecution unitVolume (thermodynamics)Virtual machineNumerical analysisSocket-SchnittstelleSoftwareArtificial neural networkType theoryCodeMotherboardOperator (mathematics)PhysicalismStack (abstract data type)Social classKernel (computing)Computer animation
16:23
AlgorithmProgram slicingVolume (thermodynamics)Multiplication signSoftwareMeeting/Interview
16:58
Visualization (computer graphics)Open sourceSource codeSurgeryProgram slicingCellular automatonSlide ruleVolume (thermodynamics)SoftwareComa Berenices
17:38
Link (knot theory)Source codeOpen sourceLink (knot theory)Commitment schemeOffice suiteFreewareSoftwareCodeField (computer science)Set theoryLecture/ConferenceComputer animation
18:19
Different (Kate Ryan album)SPARCComputer architectureLecture/ConferenceMeeting/Interview
18:43
Execution unitSoftwareProcess (computing)Flow separationHeegaard splittingMereologyMultiplicationCASE <Informatik>Computer programmingDifferent (Kate Ryan album)Cartesian coordinate systemSPARC
19:51
Endliche ModelltheorieExecution unitComputer architectureFreewareLecture/Conference
20:14
BildsegmentierungPairwise comparisonData modelBranch (computer science)WikiCodeInformation securityUniform resource locatorComputer-generated imageryComplete metric spaceAsynchronous Transfer ModeWave packetData conversionInversion (music)Menu (computing)InferenceSynchronizationEndliche ModelltheorieCodeCursor (computers)Execution unitWave packetRevision controlMusical ensembleComputer animation
20:48
SoftwareCausality
21:10
Multiplication signCodeLecture/ConferenceMeeting/Interview
Transcript: English(auto-generated)
00:02
Thank you very much for being here. My name is Shailen and I'm an AI specialist at Intel. Internally, my title is Technical Consulting Engineer where I am the link between you guys, the end users, and the core developers who develop software that you guys will use.
00:23
So today I'm going to show you how we accelerate deep learning algorithms and end software and I will use a case study, a real world case study, to show you what we've done. So the interesting case study that I'm very passionate about, that I'm involved a lot this year,
00:43
is about scanning brain cancer in humans. So that's the title of today, Brain Tumor Segmentation Using Deep Learning. So I'm based in Germany and I have an education from Germany.
01:03
So a brief agenda, I'm going to describe you the problem that we have and how we're trying to solve this problem with AI. And then I will also tell you what software tools and packages that we use in order to solve this problem.
01:21
And then we'll have a look at some performance numbers, what you can expect from such a real world case study. Let's start with some statistics as a motivation of why I, my team, and Intel are so passionate and involved in this field.
01:41
As per the global count, it's a world cancer statistics research group. We found out that there are approximately 18 million people having cancer, so new cancers were recorded. And if you look at that, close to half of that number involved people dying.
02:05
So if your family is involved in there and dying from cancer, it's not nice. So what can we do? How can AI help in there? And if you look, those numbers are just in 2018. This year, 2019, we can expect similar numbers.
02:20
And it's really important to diagnose cancer as early as possible and find solutions so we can avoid deaths, right? So an introduction to this brain tumor topic. So there is a technical or medical term we use, gliomas.
02:43
These are the most common occurring types of brain tumors and they are very dangerous. If you have them, it can grow really bad and you can die from that. And 90% of those gliomas belong to a class of highly cancerous tumors.
03:05
And to date, multi-sequence MRIs, or magnetic resonance imaging, is the de facto way to screen and diagnose for such gliomas. And when you do an MRI, imagine your body, your head goes into this MRI machine and it takes a volumetric 3D scan of your brain.
03:30
And for the doctors, the challenge to find the cancer, they have to slice or segment that 3D volume and analyze slice by slice.
03:40
This is very time consuming, it's an expensive process, but that's a very crucial thing. So segmenting the brain, this 3D volume, is very important. And how do you actually fix or cure an affected area?
04:03
You can do radiotherapy, and radiotherapy is actually using a laser to grill the bad cells. And then another way is to actually do surgery, so you cut the skull and go and remove the bad cells. OK, I know some reactions there, so sorry about that.
04:21
But in order to do these two things, you actually have to analyze slice by slice of this 3D volume and that's why segmentation is important. Now, the problem is the following, and that's the medical challenge. The medical challenge is one fold. We have a lack of specialized doctors to do that,
04:43
and I have a link down there where people posted an article about the lack of physicians to do such kind of studies and analysis. And the second thing, this whole process of segmenting is time consuming and very expensive, but we believe that computers can help.
05:06
So if we can automate this process, we have gained time to the patients, to the doctors, making this whole diagnosis process faster, and we also improve on the segmentation quality. The second thing where computers can help is in the field that we are collecting so much data.
05:26
And I got data from 2013. At that time, approximately 153 exabytes of data were collected just in the healthcare sector. And that number was predicted to grow over 2000 by the year 2020.
05:43
We are now in 2019. I need to check the numbers for that. Now, you may ask what is an exabyte, right? To give you a perspective, that's over or close to 250 million DVDs worth of information. One exabyte. And now in this year, 2019, over 2000 exabytes. That's a lot of data.
06:06
So having high compute power to analyze all of this data is great. We're in a great time. So this is where AI hopefully can help. So that's the cred of this talk today.
06:21
And let's have a look at the data set that we used in order for us to train our deep neural network that we use to do that. So the data set actually comes from the brain tumor segmentation, or BRA-TS, challenge of 2018. So it's an open source data set provided by the University of Pennsylvania.
06:41
And the goal for our deep learning algorithm is to look at the 3D volumes and figure out whether a 3D pixel or voxel contains cancer or not. So there are, so heavy tissue or the three types of cancers, so cancer or no cancer. Now a voxel, just to visualize that for you, it's like that, a kind of 3D pixel.
07:07
And let's maximize on one of the examples of that brain over there. So this is it, so cancer or no cancer, and we can color label them to different channels on the type of cancer that we're looking at.
07:21
And I have one slide summary of the algorithm we implemented. And this is it. On the complete right we have the input image from the machine, so that's one segment of this MRI scan. And then in order to train our deep neural network we needed labeled data. So a combination of this MRI input and the middle one, and that's what the radiologist has drawn by looking at the MRI input.
07:49
So he has marked the cancer areas and we got tons of these input images plus the labeled one from the doctor. A combination of these two, we call them a set of labeled data.
08:02
We use these two to train our neural network so that it can do what we have here, the predictions, or inferred images. And this is our goal, we want our deep neural network to start analyzing new patients coming in and telling us whether that person has a high degree of cancer or not, where are the cancer areas, and so on and so forth.
08:28
Let's have a look at the algorithm used. The model in this research is a U-Net model. And U-Net is very very popular in the medical sector, especially for medical imaging.
08:43
And the U-Net neural network looks like a U, that's why it's called U-Net. It has lots of convolutions involved and a bunch of researchers from the University of Freiburg in Germany came out with this research and it's really great, it's nice, it works quite well.
09:03
And it works like an autoencoder, so one side is encoding and the other side is decoding. At each stage going through that neural network we're extracting features and that's how we can detect, at the end of the day, cancer or no cancer.
09:20
So basically this neural network answers the question to which class does a volumetric pixel or voxel belong, cancer or no cancer. Now you may think all this deep learning, AI, it's complicated, well not really. If you look at the bird's picture of this whole algorithm, how it looks like is like that, very simple.
09:46
We have an input dataset, think of this as black boxes, input dataset coming in, so label data. It goes into the neural network, then the next step is train that neural network with the input dataset and once we have that we have a train model.
10:02
Job done, easy peasy. So now we have this train model, the new patient comes in and then we do inferencing and then we get the result. So all of this looks great, but what did we use in order to make this happen? A bunch of software tools.
10:22
Some of them, the Intel distribution for Python, for best in class Python performance of course, and then for the neural network framework we used TensorFlow and since TensorFlow may be painful sometimes, we leverage Keras as a very nice layer on top, so making deep learning even easier.
10:45
And then Horovod, Horovod is a technology by Uber. It's very interesting, the car company came up with this piece of technology and what does Horovod do? It actually distributing works, it splits a job, a work into multiple small subsets, small works
11:07
and distributing that work on multiple nodes or machines so that they work together. And if you look at the logo, it's like one person, well Horovod is actually a Russian word for a Russian dance and it's like one person holding the hand of the other person in a circle.
11:23
That's what the logo looks like, the dots are the people, the heads are the people holding hands and this is the key message, distributed computing, one node talking to the other node and so on. So with Horovod we split our training process on multiple machines so that we could do the training of our deep neural network faster.
11:45
And then the second stage of the training is inferencing. How do we do inferencing fast? It's using a tool called OpenVINO and OpenVINO is a very nice tool that makes inferencing easy and fast thanks to its optimizations in place in there.
12:02
Now let's have a look at some numbers I got by going through this training process. Now you can imagine, we have this MRI input coming in from that MRI device. It contains high quality images, it's a huge data set, large images, with lots of detail.
12:23
To train that neural network it's very taxing, it's very intensive for a computer to do all this processing. So obviously training takes a long time. And some performance results I got, complete right when I used TensorFlow stock, that is from Google
12:43
and on one machine it took me 76 hours to do the whole training, going through 30 epochs or 30 times through that neural network. I was not very happy with 76 hours, I thought I could do better. And of course the next step is to actually use a better TensorFlow
13:03
and that's the one which Intel optimized. And this is the second point in there, Intel optimized TensorFlow. With just changing that TensorFlow package I dropped to almost 50%, like better time, 2x performance boost just by using software to 43 hours.
13:26
But still I was not very happy. Then I started looking at distributed training. How can I use multiple machines to work together and do the training faster? So then I looked at, let's look at the last two parts,
13:45
4 nodes, means 4 machines with Harvard, so 8 workers. 8 workers on 4 machines, I dropped from 43, 44 ballpark hours to 7.5 hours. That was great. And increasing the number of workers to 16, even better, 5 hours.
14:05
And with 5 hours I was more or less happy based on that huge dataset that I had. So from 76 to 5, the key message there is use really optimized software
14:20
and distribute your work. And if you have a cluster, make use of a cluster. Why should you stick to one machine when you can make use of multiple machines? So use better software, of course, looking at just one node from 76 to 43. For me that was mind blowing. I really appreciated that.
14:42
So without much work from me, it's leveraging just better software. Now plugging all of this in the big picture. This is how it looks like on top. What I wanted to do, solve a medical problem. This was my software stack involved. And now you may ask, okay, what is this Intel optimized TensorFlow?
15:04
It is the same TensorFlow code that Google releases. And what we do, we take this code and we plug in our performance library in there. So this performance library, it loves math. So my unit network does a lot of math intensive operations.
15:21
And that library, the Intel math kernel library for deep neural networks, it loves math. Whenever it sees math, I say yes. So then it boosts all those math heavy computations so that I could do my training faster. As you can see the numbers that I collected. And of course I leverage best in class Intel Xeon processes in order to do that.
15:47
And you saw I had four nodes, so four machines, four Xeon processors. And so I could do training faster. Actually when I said one node, one processor, I know it was actually two sockets.
16:01
So two physical Xeon processors on one motherboard. So I had eight physical processors working together to give me close to five hours, dropping from 76. Now, you have seen that only one slice was being inferred.
16:22
So imagine now for that 3D volume that came in, every slice is being analyzed. If you had to have a doctor to analyze one by one, it's really expensive, time-consuming. The doctor may say, oh, that's too much. And that's just for one patient. An AI algorithm doing that for you is obviously much better, easier for everybody, for the patient, for the doctor.
16:46
And if you're curious how all of this plugs into 3D, this is how I got there. I used a software called Mango to draw this 3D volume.
17:01
And that's the MRI brain originally. And you can see all the slices stuck together and you can see the volume of cancer there. And for the doctor who has to do radiotherapy or even surgery to cut the skull and go there, he needs to know exactly where are the bad cells. Otherwise he may destroy good cells and the person goes into a coma or something like that.
17:23
So breakthrough stuff. Now, if you're curious also, this whole source code and the dataset is all open source. Even the AI software tools that I showed you from that slide earlier, that software stack, they are all open source.
17:42
And this is an intense commitment to AI. We're going open source, free software, free tools. And if you want to have a look at my code as well, especially if you're in the medical field, I published all my work on my GitHub. And that's the link there. And you have instructions on how to get all the dataset and how to get started and play around.
18:02
You can also reach out to me. So from my GitHub you can get to me. And that's it. Thank you very much for your attention. I'm open to questions.
18:28
Could you use the microphone because it's being recorded. It's easier then. I'm sorry. What is the difference between Horovond and SPARC? Because you talked about the workers and nodes and all that kind of stuff.
18:40
Okay, well. So on the SPARC architecture level, Hadoop and so on. So Horovond is a pure MPI based package or program. So what it's doing is splitting my work into MPI processes and sending it to native nodes.
19:05
So there's no SPARC involved in there. If you would be using SPARC, we have another solution. It's called BigDL, which is also a SPARC application. And with BigDL you could do the same thing. Splitting work in multiple chunks over several Hadoop nodes.
19:24
And that's the main difference. So on a native cluster, you obviously cannot use SPARC because you don't have the software stack there. And in this case you would use Horovond. Cool.
19:45
One more. Have you tried other unit architectures or just one for this experiment? Very good question. So this example here is the 2D unit model.
20:03
We have also tried the 3D unit. And if you go to my GitHub, which I will just move here. So you guys who are really curious, I totally recommend you please do that. Go there. And you will also have my Horovond code in place.
20:25
Where's my cursor? So this is the 2D version leveraging the 2D unit. And I also have 3D unit. And training with Horovond, all the code is there.
20:41
It's really nice. So I've tried 2D and 3D unit. Thanks. Any other questions? Curious about anything? Any Intel software tools, technologies that would like to know? Ask me any questions, I can answer them. Hopefully.
21:05
Who funds the cancer research? Is this just to demonstrate the capabilities? Okay, the question is who funds the cancer research? This stuff is done by us only.
21:20
So it's coming from our own motivation to try to solve something. We have partners helping us or even taking this code and using that. But there's no external funding, if that's what you were referring to.
21:42
So if there's no more questions, let's thank the speaker one more time.