How to apply deep learning for 3D object
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 160 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/33761 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
AreaSoftwareIntelObject (grammar)Strategy gameCASE <Informatik>MereologyProduct (business)Focus (optics)InformationContent (media)Table (information)Data modelWave packetFunction (mathematics)CNNObject (grammar)Focus (optics)MereologyResultantPresentation of a groupPlotter1 (number)Right angleState of matterWeightComputer animation
00:56
Software developerImage processingTwitterComputer virusPersonal digital assistantNatural languageNatural numberProcess (computing)Mathematical analysisSeries (mathematics)Pattern recognitionSpeech synthesisProduct (business)GoogolCodeInformationStructured programmingFormal verificationCalculationGraphics processing unitBefehlsprozessorMonoidal categoryVisual systemWhiteboardData modelPreprocessorMathematical optimizationData AugmentationParameter (computer programming)Focus (optics)Axiom of choiceForestLinear regressionSupport vector machineRandom numberLogistic distributionObject (grammar)CASE <Informatik>Source codeIntegrated development environmentRead-only memoryForceReduction of orderCNNConvolutionComputer-generated imageryoutputForestGoogolGoodness of fitAxiom of choiceObject (grammar)WhiteboardSoftware developerCodeIntegrated development environmentContent (media)WebsiteWave packetProduct (business)Akkumulator <Informatik>Multiplication signSet (mathematics)Time seriesImage processingValidity (statistics)Machine learningFocus (optics)Augmented realityTwitterLogistic distributionMereologyCASE <Informatik>RandomizationResultantProcess (computing)Natural languageComputer programmingService (economics)Universe (mathematics)PreprocessorSound effectFormal verificationSpacetimeConvolutionWindowMedical imagingoutputSpeech synthesisKernel (computing)Pattern recognitionInformationSemiconductor memoryMathematical optimizationMappingCalculationNeuroinformatikHypercubeParameter (computer programming)Type theoryMonoidal categoryCore dumpTraffic reportingCondition numberGrass (card game)BefehlsprozessorState of matterObservational studyBoss CorporationSpeciesAuthorizationIdentical particlesGroup actionCuboidData structureSystem callGastropod shellCovering spaceDeterminant1 (number)Right angleMetropolitan area networkLogic gateAdditionPlotterWeightDemosceneComputer animation
08:11
ConvolutionObject (grammar)CNNData modeloutputFile formatPopulation densityFunction (mathematics)Distribution (mathematics)Product (business)Asynchronous Transfer ModeSequenceShape (magazine)Kernel (computing)Wave packetMetric systemVariety (linguistics)Augmented realityData AugmentationAbelian categoryWeightSocial classComputer-generated imageryCartesian coordinate systemRotationCalculationRead-only memoryGraphics processing unitBefehlsprozessorMathematical optimizationMultiplicationThread (computing)Computer configurationSystem callRotationArithmetic meanState of matterAnalytic continuationComputer configurationMultiplication signContent (media)Power (physics)Special unitary groupSet (mathematics)NumberSoftware testingMaxima and minimaPhysical systemMathematical optimizationProduct (business)Functional (mathematics)Category of beingLimit (category theory)BitVector spaceDistribution (mathematics)Computer fontMetric systemShared memory1 (number)Software developerRight angleShift operatorFigurate numberCASE <Informatik>SpeciesSheaf (mathematics)Graph (mathematics)Chaos (cosmogony)WebsiteQuicksortInsertion lossInterface (computing)MathematicsImage resolutionCausalitySound effectSpacetimeGroup actionWindowNormal (geometry)ArmSpeech synthesisBefehlsprozessorConvolutionProbability distributionoutputSocial classMappingFunction (mathematics)CodeGreen's functionLevel (video gaming)Regulärer Ausdruck <Textverarbeitung>View (database)VoxelMatrix (mathematics)Medical imagingLine (geometry)Shape (magazine)Kernel (computing)File formatMonoidal categoryVariety (linguistics)RandomizationGraph coloringSlide ruleDataflowCalculationAugmented realityComputer fileCuboidComputer animation
15:25
Shift operatorLine (geometry)Data AugmentationWeightStrategy gamePattern recognitionGamma functionProduct (business)Installable File SystemDemo (music)Maxima and minimaResultantPoint (geometry)WebsiteWeightPermanentAdditionArithmetic meanPhysical systemStrategy gameDemo (music)Object (grammar)Wave packetRight angleBit rateComputer programmingSpeech synthesisLimit (category theory)Line (geometry)Shift operatorCuboidSimilarity (geometry)Slide ruleComputer fileFocus (optics)VideoconferencingTable (information)Validity (statistics)Green's functionComputer animation
19:15
Demo (music)Electronic data interchangeGraphic designSimilarity (geometry)Content (media)Shape (magazine)Computer animation
20:53
Demo (music)Row (database)Group actionComputer animationXMLUML
21:10
Object (grammar)Machine learningCalculationEmailPresentation of a groupDisk read-and-write headRepresentation (politics)Computer virusRight angleMetric systemSocial classPhysical lawLecture/ConferenceMeeting/Interview
22:06
RotationMatrix (mathematics)Lecture/ConferenceMeeting/Interview
Transcript: English(auto-generated)
00:04
So thank you for coming to my presentation. And I talk about how to apply deep learning to 3D objects. So this presentation consists of two parts. The first part, you will get knowledge about how to approach making the deep learning product.
00:24
The second part, you will get knowledge about deep learning applied to 3D objects. So the first part consists about these things and the strategy, find the right program, find the right method, and keep re-challenging and focus.
00:45
The second part is deep learning applied to 3D objects. I will talk about the BoxNet and how to improve it, BoxNet, and the results. This is my self-introduction. My name is Masayo Ogushi.
01:01
I work at Kabukin, and I'm an image processing developer. This is my Twitter account. What is Kabukin? Kabukin provided to the on-demand manufacturing service. Sometimes receives 3D data to manufacture by using 3D printers and others.
01:23
So that's the reason why I talk about deep learning applied to 3D objects. So the first part is strategy. Find the right program. Would you like to use deep learning to solve a problem?
01:41
Is anyone? No? Thanks. So what is the best case to use deep learning? So the most case, image processing and speech recognition are high performance than the other method. In some cases, it is possible to apply the natural language processing and time series analysis.
02:05
What is the worst case to use deep learning? So not enough data. Can't prepare a pre-trial model and need to 100% accuracy because in general speaking, machine learning is impossible to achieve the 100% accuracy.
02:27
So the next is find the right method. How can you find the best way to solve your problem using deep learning? In the most case, let's search Google.
02:43
Not possible because if you use Google, you have to know about a good keyword so at first time, you don't know about the good keywords. In my approach is using Google Scholar. So it is get the horrible things, good methods, good keywords.
03:02
Which university about this know about this problem? So you already know which university website is good so it is possible to get the code and data. So in my case, Princeton University provided the data and code. And the GitHub, GitHub is provided the GitHub
03:24
and the latest paper. So you will get the code and paper and the following Twitter user. So it is possible to get the latest information. And then so the book, you will get the structure, knowledge, and the active.
03:41
It is provided the paper site so that you can find the latest method. And so you already know the good keyword so that you will use the GitHub and Google. It is possible to get a good code and good knowledge. So the next content is keep recharging.
04:04
So you get a lot of training data so let's start using the full data set. Not possible because a lot of training data it takes a lot of time to train so you have to check the whole thing.
04:20
First thing, prepare small data set and check module works correctly. And second step, prepare easy to verify training data set. And most model can be trained with data sets such as MNIST. You have to check it works.
04:40
And deep learning, deep learning the off-shelf method to improve accuracy. So you have to check the training accuracy and the validation accuracy. If the both accuracy are not improved it, so you have to stop it. And check the result by the visual board such as the tensor board.
05:03
And you have to increase in challenge times by improving the calculation speed using GPU, optimize CPU, focus. Deep learning are lot of methods to improve accuracy.
05:22
So the model, how deep, how easy the structure, adjusting the hyper parameter and pre-process data, data augmentation if you're using graphical data and optimizer, SGD, alarm and so on. So depending on your situation,
05:41
so the enough computation resources and enough data to a deep and complex model. So enough computation resource but enough data find a good pre-trained model and focus on the pre-process such as data augmentation. So not enough computation resources or data
06:03
consider other ways of your program such as logistic regression, SVM, random forest. And deep learning probability isn't best choice. So the end of the first part,
06:21
I will talk about the next part, deep learning applied to 3D objects. I will talk about BoxNet. There are a lot of deep learning models, how to choose one? I consider three things, resource, performance, speed.
06:41
So resource is the computation resources and human resources. Performance is accuracy and speed, speed of development. I choose the BoxNet. Why choose the BoxNet? So BoxNet advantage the resource and speed.
07:01
So computation resource is good because in my environment it works memory 32 gigabyte and GPU type is there and performance is accuracy also. So 83% accuracy in the paper and speed is open source and simple core.
07:23
So I will talk about BoxNet processes. At first time, so the maps 3D data to 32, 32, 32 boxer. So the 32, it is possible to choose it and release data size because 3D data is rich data. So we have to reduce the size
07:44
and the input is a convolution 3D. So the convolution 3D effective for the filter. I will explain about the 3D cases. So because the 3D case is difficult. So the prepare the input image
08:02
and prepare the kernel window. So we will watch the red spaces and input image value is one and multiply the five and plus and input image value is one and multiply one and plus
08:21
and repeated action again and it is get the value convolution feature is 33 and the repeated action orange spaces, green spaces, blue spaces and it is get the convolution feature value. And then, so talk about the 3D cases.
08:43
I will provide it the input image and the 3D kernel and 3D kernel move to the image and all image map and provide it the convolution feature
09:00
and then move the inner and provide it the convolution feature. If it is action again, so they get the first convolution feature here. So in figure cases, it is seven times action.
09:22
So if you're interested in the implement of the code, so if you use the Keras, it is only one line up. So in this case, 32. So this, in that case, the input shape is 32, 32, 32. In the last one, one meaning about the color channel and the setting of the kernel size
09:42
and the slide is how the kernel move it and data format meaning about which color channel. So the Keras support holds the tensor flow and channel. So the tensor flow and channel are different input shape. So that's the reason why the settings are color channel.
10:06
So the next step is the max pool. So max pool effective for the detected image and then so it get the convolution feature layer and the selected the max value.
10:21
In that case, red spaces, max value is six. So they get six and orange spaces gets the max value eight. Repeat the same action. The max pooling action for the 3D data is the same action so they provide it to the pooling window
10:41
on the convolution feature. Repeat the action. Get the max pool layer. And if you use the Keras, it is only the one-liner. So the setting size and the channel.
11:03
So the input image, so the filtering and detect and the fully connected and limited sizes, limited sizes because in the class file cases, number of class and applied the softmax function. A softmax function,
11:21
it maps output probability distribution and it is easy to differentiate. So the Keras using only these things, so the define about the fully connected, limited size, limited number of class size
11:41
and the softmax layer. These code is define the model. At first, it's define the model and the convolution 3D, convolution 3D and the max pool and the fully connected,
12:01
limited size, limited size and applied softmax function. So it also define about the loss function and the metrics and then training the data, so the setting the box mapping data and setting the class file level.
12:26
So I will talk about the next step is improving technique, Akirashi. I think improving Akirashi has two approaches. The first approach is the model and the second approach is data. So model approach,
12:41
a variety of way to improve Akirashi. Disadvantage deep model takes a lot of resources and it is obvious which model is better. And data approach advantages the effect of change options and disadvantage approach are limited.
13:02
So in my case, model approach is applied to random dropper, regular and data approaches, data augmentation for 3D data and cross-weight for unbalanced category data. So 3D data augmentation is a specialty case, so I will talk a little about it.
13:24
I think data augmentation has advantage over the other method. The effect are obvious and it does not increase in calculation time unlike adding layer to the model. So data augmentation, these things,
13:43
so the changing data, rotation, shift, share. So the view applied to the code, so they provided the voxel data and each voxel data applied to the augmentation matrix and getting the changing data
14:03
and re-changing the number format. So I will talk about the example case, rotation matrix, this one, so the changing data is a rotation,
14:23
shift matrix, changing the data, this one, and the share matrix, changing the data, this one. So in my case was adding the data augmentation data
14:41
for the training data. So improving technique speed, deep learning has a lot of ways to improving calculation speed, such as use GPU or CPU optimization, multi-slat, prepare feature set.
15:02
So the CPU optimization is very effective for the data augmentation, so I will talk about CPU optimization. So if you use TensorFlow, so setting the build option, it is possible to apply the CPU optimize. However, you have to check which option
15:21
is available, so the result, this result shows the validation accuracy, so the mirror line meaning the baseline, and the red line meaning about the data augmentation, shift X and shift Y,
15:43
and the yellow line meaning about the shift X, Y, and applied to cross weight, and the green line meaning about adding the training data by the data augmentation, and applied to cross weight.
16:04
This table show about this result. So this is the result validation accuracy. So the baseline is 79%, however, so the adding the shift X and Y data, and applied to cross weight, achieves 85%.
16:26
So the conclusion. So at first part, my strategy find the right method, and find the right program, and find the right method,
16:41
and re-challenge and focus. So the other case, so the right program is 3D object recognition, and the right method, try to use the box net, and re-challenge in data augmentation, and the customized model, and the focus improves validation accuracy.
17:07
I will show the demo, and this slide show about the similar 3D objects,
17:24
however, so this text is Japanese. Just a moment.
18:05
Sorry, it takes a lot of time, I prefer the video.
18:48
Okay, I'll show the demo. Choose the file about the airplane,
19:04
and set it. Okay, sorry.
19:34
So approach the airplane,
19:48
this possible find such is similar shape is,
20:04
and the next contents is a bathtub.
20:39
So the last contents is a toilet.
20:43
Toilet and a chair, it is possible find such it.
21:07
So the end of the presentation, and show the recording. I think the deep learning for 3D object, it is very rare case, and who interesting in the working in Japan, so you have to access the site,
21:22
and send the email. That's all. Thank you for listening my presentation. Five minutes or so for questions,
21:41
if anyone has any questions. Hello, thank you for your presentation, and do you have any other metrics, or just accuracy, because accuracy, especially if you have imbalanced classes, could be pretty misleading, like any kind of loss,
22:01
or precision recall, what about them? So you question me about the other calculation metrics, so such as rotation, and so on? Uh-huh, uh-huh.
22:24
Other metrics, sorry. So I only calculate about the, accuracy, so not calculate it as a metrics. Okay, thank you.
22:44
Questions? Thanks again for the talk.