CNN-based tools in GRASS GIS
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 295 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/43394 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
FOSS4G Bucharest 2019189 / 295
15
20
28
32
37
38
39
40
41
42
43
44
46
48
52
54
57
69
72
75
83
85
87
88
101
103
105
106
108
111
114
119
122
123
126
129
130
131
132
137
139
140
141
142
143
144
147
148
149
155
157
159
163
166
170
171
179
189
191
192
193
194
195
196
197
202
207
212
213
214
215
216
231
235
251
252
263
287
00:00
Grass (card game)Computer networkScheduling (computing)ImplementationConvolutionArtificial neural networkInsertion lossState of matterMultiplication signLecture/Conference
00:27
Table (information)Content (media)Software frameworkSource codeImplementationComputer networkCNNPopulation densitySatelliteSystem programmingAnalogyArtificial neural networkAnalogyModule (mathematics)BitPopulation densityPhysical systemVector spaceMultiplication signResultantComputer architectureMappingWorkstation <Musikinstrument>Imaginary numberXML
01:31
AnalogySatellitePopulation densitySystem programmingGrass (card game)DigitizingComputer networkSource codeComputational visualisticsOpen setFigurate numberAuditory maskingGrass (card game)TriangleSupervised learningPixelMachine visionUnsupervised learningTwitterDifferent (Kate Ryan album)File formatSubsetArtificial neural networkPoint (geometry)Pulse (signal processing)Computer networkCategory of beingResultant1 (number)ConvolutionRow (database)Inheritance (object-oriented programming)View (database)Lie groupGoogolComputer animation
06:02
ConvolutionComputer networkKernel (computing)Electronic data interchangeInternational Date LineError messageConvolutionError messageArtificial neural networkCross-correlationResultantPattern recognitionComputer architectureComputer networkPoint (geometry)Computer animation
07:24
Computer networkError messageSource codeInternationalization and localizationCNNInstance (computer science)MereologyArchitectureAuditory maskingDisk read-and-write headObject (grammar)Computational visualisticsInstance (computer science)Real numberBuildingDialectImplementationCombinational logicCuboidComputer architectureSocial classDifferent (Kate Ryan album)PixelComputer networkOpen setMappingIntelligent NetworkMereologyEndliche ModelltheorieComputer animation
10:22
CNNArchitectureSource codeVector spaceDisk read-and-write headLinear regressionError correction modelCAN busData modelConfiguration spaceWeightManufacturing execution systemUsabilityInfinite conjugacy class propertyAuditory maskingFinite-state machineObject (grammar)CircleEndliche ModelltheorieRaster graphicsLevel (video gaming)Social classMappingWave packetBranch (computer science)WindowCuboidInstance (computer science)Shape (magazine)Different (Kate Ryan album)Parameter (computer programming)PixelWeightNumberDisk read-and-write headModule (mathematics)Computer architectureMereologySet (mathematics)Task (computing)Computer networkHypermediaCASE <Informatik>Data conversionDampingSphereOnline helpSound effectMiniDisc1 (number)Parallel portTorusComputer filePlanningMultiplication signComputational visualisticsSystem callForestGrass (card game)XML
15:40
Normed vector spaceRaster graphicsMusical ensembleFunction (mathematics)Computer-generated imageryInsertion lossSource codeExtension (kinesiology)Network topologyVector graphicsGrass (card game)ArchitectureMachine visionScale (map)Visual systemPattern recognitionComputerBuildingComputer fileArithmetic meanMereologyMultiplication signNetwork topologyArtificial neural networkPolygonAcoustic shadowLine (geometry)Social classResultantShape (magazine)10 (number)Web pageExtension (kinesiology)Computer architectureAreaConvolutionEndliche ModelltheoriePoint (geometry)Grass (card game)Instance (computer science)Block (periodic table)CuboidCovering spacePower (physics)View (database)WindowCellular automatonFreezingWave packetSpeech synthesis40 (number)Computer networkComputer animation
20:58
Endliche ModelltheorieComputer architectureRectangleFinite-state machineSelf-organizationSoftware testingComputer networkDampingStandard deviationShape (magazine)DialectDifferent (Kate Ryan album)outputObject (grammar)Multiplication signSpacetimeWave packetSet (mathematics)Auditory maskingFunction (mathematics)Task (computing)ResultantPolygonVideo gameConvolutionGrass (card game)PixelSatelliteElectric generatorProcess (computing)NP-hardMoment (mathematics)BitLine (geometry)RoboticsRight angleBoss Corporation1 (number)AreaCircleWeightXMLLecture/Conference
Transcript: English(auto-generated)
00:07
Hi, I'm going to talk about something else than you have seen on the schedule, because it was written CNN-based Grazgis or something like this, so I'm going to talk just about
00:21
one implementation of convolutional neural networks in Graz. Firstly, I will say something about why do I believe that it's the right time to implement some artificial neural networks into GIS, then I will briefly say something
00:42
about the architecture itself, and then I will show how to use the implemented modules and some of the results. So the situation nowadays is that there is higher density of the satellite monitoring
01:02
systems and of aerial imagery, and also there is a bit of vectorization of analog maps, which means that we have more and more data. Also what is nice is that the data, of course, are of higher quality, because unless
01:27
you are like a ladded or something, you just want to move forward in the technology and not backward. Another nice thing about the situation nowadays is that there is a lot of open data, maybe
01:47
still not enough, but then there is this trend to open the data so it's easier to access them, and the same applies for the data standardization, because it really sucks
02:05
when you have to parse every image separately just because everyone uses his own format and naming convention and triangle pixels and stuff like this.
02:21
Okay, but Mask R-CNN makes a classification, so are there some ways, some other ways to do it? Of course, there are. You can do some manual classification in GRASS GIS. It means that you will use the GRASS digitizing tool and just click like a monkey.
02:47
Then you can use some kind of supervised classification. It is also supported in GRASS GIS, and even the unsupervised classification is supported
03:03
in GRASS GIS. The question is, why do we need neural networks, the artificial neural networks, to do this when we already have some tools? So the thing is that the human brain is the most powerful tool we know or we have,
03:27
or that's at least what is generally believed, which could be also some kind of neuro-curable propaganda not to lose their job.
03:42
But it's up to you. And also, why should we try to alter the human conscious and the human perception? It's because we want some human understandable or readable results, which I will try to
04:06
show on the next figures. Like here, the computer just outputs an apple and a pear. I think that most of the reasonable classifiers will recognize an apple and a pear.
04:26
But it can be like this, because it's an apple and it's a pear, which seems quite easy. But when you look at this, there is a huge cosmology of different kinds of apples.
04:44
And for us, as human beings, it is pretty obvious that everything on these pictures are apples, or maybe not with the first ones, but we can see an apple in that.
05:03
But for the computer, or from the computer vision point of view, they are completely different. And even more destructive example is this statue of a cat, because when a human being
05:22
looks at these pictures, he or she can see that it's the same statue. But again, from the computer vision, when you just parse pixels, and you are not a thinking
05:41
being, then it can look like something completely different. From the artificial neural networks, I have used their subset called convolutional neural networks. I guess that most of you know what is convolution, so I'm going to reveal another lie, just like
06:05
the one with brains. In convolutional neural networks, mostly, there is used cross-correlation and not convolution, but it's 1c extra, so CNN is faster than CCNN, and the result is the
06:25
same if you use convolution or cross-correlation in there, so it really doesn't matter, it's just faster to compute the cross-correlation. And why convolutional neural networks?
06:42
I will try to explain it with an example. In the year 2016, in one of the image recognition challenges, an architecture called ResNet using the convolutional neural networks was proposed, and it reached the top five
07:03
error of 3.6%, which is like almost 4%, so every 25th image was classified wrongly, which can seem like not the best thing we have ever seen, but humans reached 8% error, so
07:28
it was the year when Philip K. Dick and all the apocalyptic sci-fi books about computers smarter than people got real.
07:46
There are different kinds of classification. There is semantic segmentation, which is a pixel-wise classification of every pixel in the image, and that's not the one mask RCNN uses.
08:04
There is also a simple classification, which just returns the class of the object in the image. If you are a tough guy, then you connect it with the localization, so you have even
08:20
the bounding box telling you where the object is, and when you have more instances of one class in the picture, it's called the object detection. But mask RCNN uses instance segmentation, which is like the combination of the first
08:44
one and the third one, so you are segmenting every instance of the object separately, so instead of this big puddle of yellow pixels, you are able to recognize, for example, every
09:02
building separately. You just don't have this and this huge, huge puke of building pixels. This is just another example of instance segmentation. Mask RCNN is divided in two parts, the architecture, the so-called backbone architecture,
09:27
and the head architecture. For the backbone architecture, you can use different models, but in the implementation into GRASGIS, I've used ResNet, and the user has the possibility to choose two ResNets,
09:49
ResNet-50 and ResNet-101. After ResNet, you get something called feature maps, and on top of this, there is RPN, which
10:04
stays for region proposal network, and as the title whispers, it proposes regions where the object could be or the instance of an object, and it works like this.
10:26
You have just a sliding window of different shapes and sizes, and it slides through the feature map and generates possible instance localizations, which are then parsed deeper
10:51
in the network to decide whether the object is really there, because it's computationally very, very demanding, so you don't want to parse everything all the time, so there
11:05
is like a filter of these encore boxes. Now to the head architecture, and there are three parallel branches. The first one is just a simple softmax layer, which is returning a class of the instance
11:26
of the object, which means that here you would get something like cowboy, or most probably cowboy if it's not wrong. The second parallel branch is just a regressor, which returns the bounding box telling you
11:50
where the object is in the picture. Then parallel to these two branches is the mask branch, which then segments the already
12:08
localized object and returns this pixel-wise mask telling you where exactly the object appears, talking about pixels and not the bounding box.
12:25
Two modules were created for GrassGIS in one library, but it's hidden. The train module and the detect module, because the detection is the interesting part,
12:42
but unfortunately you have to train the module to do something, which is the boring part. But pretty good for your salary, because it means that you will just start the training and then you can go home and wait like five days getting the salary, and then check
13:04
the results and find out that they are completely wrong, start again the five-day computation, and get more money for nothing, and then get fired. So firstly, you should use the trainer, dot trainer module, where you have plenty
13:25
of different, is it readable what is written there? No? Yes? No? Okay. No one knows. Then you have plenty of parameters you can specify to make your architecture more suitable
13:43
than the task you are going to do, so I'm just going to underline a few of the mandatory ones. Of course, you have to define the classes you want to detect in there.
14:03
Then if you want, you can load pre-trained weights, which can pretty, pretty help your training to be faster. It's like if there is a kid and wants to recognize balls, it's good when you know
14:27
how to recognize circle or sphere, so it's like this, pre-trained weights are something like model trained or a different task, but it's easier than to detect the stuff you
14:45
want, but you don't have to do it. You can train your model from scratch, but it will take a longer time. Of course, you have to define the path to the training data set because it's supervised
15:01
classification, and then you just start training, you define number of epochs of training. After each epoch, the model is saved to your disk, which is the next step. It's really good to define the path where you save the models.
15:24
Then you can start on the detection, which means that you have to load the model again, and pass there some rasters. These rasters could be maps already imported in GRASGIS, or they can be external, external
15:46
reference to raster files. The user can specify whether he wants to represent the detected instances as points
16:04
or areas which are polygons. Because sometimes, for example, with sick trees, it's enough to detect them as points, not to have the polygon in the shape of the tree. It's an overkill.
16:23
Now I have some brief examples of the results. Here, it's a model trained to detect soccer or football pitches. It depends if you are from U.S. or from Europe.
16:41
It works quite well, and these are polygons inside the lines. Here is the same for the tennis pitches. It's pretty nice, because when I used different classifiers, they had pretty big problem with the tennis pitch covered with the shadow.
17:04
If they recognized it, they recognized only the part which is not covered by shadow. So I was pretty happy about this result. Here, it's the same just to show you that you can do it also for multiple classes.
17:23
It's up to the user how many classes he wants to detect. Yeah, and why is it saving the model after each epoch? I want to show it on this example. Because you never know, or most probably you don't know which epoch will be the best one.
17:47
So you run it for 200 epochs, and when you see that it's really good, you can kill it after the 100 epochs, and you still have the intermediate results. And it's because of this, like after the first epoch, this is the building detection.
18:06
It's completely wrong. After the 10th epoch, I'm not sure if it's worse or better. But then after some more epochs, you can see that the result is quite good.
18:23
After some more epochs, it's even better. Now, what was surprising for me, I mean, what was not surprising is that the tennis page is detected as a building. But I believe that this was due to the fact that I trained it on a lot of orange roof buildings,
18:46
which was maybe a mistake or not sufficient data set, as we have heard a few times today. But the nice thing is, for example, the building down there, where there is just a small piece of the building, and it's still detected as a building.
19:07
But it still doesn't explain why the model is saved after each epoch. After 30 more epochs, it got much and much worse. The small piece of building is not detected.
19:24
The building on the left is also not detected. A piece of road is suggested as a building. So that's the very peculiar or dangerous thing about the neural networks,
19:50
and that it's pretty easy to overfit such a huge model. And therefore, it's good to save the intermediate results,
20:00
and don't wait for the last epoch. Because the more train, it doesn't mean that it will get better and better forever. Here is just some info. You can find it in the official GrassGIS add-ons repository.
20:23
So you can install it in Grass with the GNG extension command. And there are some next steps I would like to do, like support more and more convolutional neural network architectures in Grass,
20:42
hopefully as soon as possible. And from these pages, I have stilled the pictures. And thank you for your attention.
21:12
So there's no questions, are you sure? There were many questions before. Oh well, there's one. Thank you for breaking the ice.
21:20
I want to ask if you know about any pre-trained models in this space for satellite images? I know about pre-trained models, but not for satellite images. But when I was testing it to get these results,
21:40
I've made a test where I was using the pre-trained model, trained on ImageNet, so on common objects like cars and people and stuff like this, on the streets. And even with pre-trained models like this, it is faster.
22:04
So it is better to use a pre-trained model, which is trained on something else, than starting from the scratch. Because as I said, when you are learning how, for example, cars look like,
22:21
you learn the basic shapes. You teach yourself to recognize circles, rectangulars, and stuff like this. So at least there is something in the way. Just another one. Do you know if there's any initiative to create some data set for a pre-trained model
22:45
in satellite images, like some organization that already does that? Yes. Confidential? But I'm working on this.
23:01
But it's hard work. Well, firstly, it's not done yet. And secondly, it's hard work to tell the JRC in Italy to open it, really,
23:22
because they have sometimes quite weird opinion about what is open data. Thank you. In your network architecture model, the RCNN, you are using only 2D convolutional layer, am I right? Only...
23:40
2D convolutional layer. No. Yeah, maybe it... Ah, yeah, yeah. 2D convolutional layer. I'm sorry, I'm sorry. Yes. So not 3D yet. So are you considering that in the future model that you want to integrate?
24:01
Yeah, I would like to make also something for 3D. But, yeah, I've tried just... To be honest, I've tried just once in my life to train 3D convolutional networks. And it worked, but it was for some very simple task, and tasks.
24:25
So, yeah, I would like to do it, but for sure I need some time, and I need to study the architectures which are proposed now, the bleeding edge and stuff like this.
24:41
Thank you. Thanks. I don't know, maybe it's not for your question, but maybe you know. Have you tried to train on one resolution data set,
25:05
and tried to take results from other resolution? For example, train on one meter data set, and like test on 10 meter. What result you...
25:21
Yeah, this one was done, the training was done on one resolution, which was like 30 centimeters. It was like the training data set was based on the Bing imagery, and the detection was done on something like 60 centimeters,
25:45
so it's like two times worse. But I've tried once to train something on a two meter data set, and then apply it on Sentinel with 10 meters,
26:01
but it didn't work for me. I think that it really depends on the task. If you want to detect, for example, crops, it should work theoretically. But for cars and detailed things, it's really hard to detect it.
26:29
And we should train on both resolution? Yeah, yeah, yeah. If you have data from both resolution, and you have them labeled as a training data set,
26:43
just try to feed them on the model with both resolutions, and then there is some re-sampling and stuff like this inside the architectures. So you can use different resolutions as input data, and then it should learn on both the resolutions, hopefully.
27:10
Out of curiosity, correct me if I'm wrong, but as far as I know, standard mask RCNN, we define detection regions as rectangles.
27:22
It's just RCNN. The generations aren't that, and there was RCNN, which was just this, and this rectangle and detection. Fast RCNN, which was like a bit faster. And they got a new one, and they didn't know a cool name,
27:40
so they called it Faster RCNN. And then they find it was still the rectangles, and then the same guy came with, I think that he was, I have the name even somewhere here. Yeah, and Gershik. So then he came with Mask RCNN,
28:02
and it's already this pixelized. Okay, I was gonna ask the same thing. Did you define something else to enforce your outputs to create a polygon-like, or something like that? No, the result from the architecture is in pixels,
28:23
and then there is processing in grass to make the polygons. So it's a grass result, the network output. Yeah, yeah, yeah. Okay, okay. And when talking about this, I think that like half a year ago,
28:40
I've seen a paper which proposed Mask RCNN Plus. So there should be something even better, hopefully not so hard to upgrade this version. Okay, okay, thank you. Thank you. Well, then we can stop here. Thank you very much.
29:02
And, well.