Geospatial data processing for image automatic analysis
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 295 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/43401 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
FOSS4G Bucharest 2019182 / 295
15
20
28
32
37
38
39
40
41
42
43
44
46
48
52
54
57
69
72
75
83
85
87
88
101
103
105
106
108
111
114
119
122
123
126
129
130
131
132
137
139
140
141
142
143
144
147
148
149
155
157
159
163
166
170
171
179
189
191
192
193
194
195
196
197
202
207
212
213
214
215
216
231
235
251
252
263
287
00:00
Multiplication signElectronic data processingMathematical analysisProof theoryState of matterOptical disc driveDemosceneSpeciesGoodness of fitMeeting/Interview
00:38
Process (computing)Mathematical analysisComputer-generated imageryProcess (computing)AlgorithmElement (mathematics)Inheritance (object-oriented programming)Computer animation
01:00
AlgorithmAlgorithmMereology
01:24
Artificial intelligenceComputer-generated imageryBuildingContext awarenessPoint (geometry)Right angleBuildingCASE <Informatik>Medical imagingFocus (optics)SatelliteResultantAuditory maskingBoolean algebraComputer animation
02:18
Instance (computer science)Computer-generated imageryPersonal digital assistantMathematical analysisParsingStack (abstract data type)Library (computing)Type theoryMusical ensembleInternet service providerCASE <Informatik>Theory of relativity
02:45
Computer-generated imageryInstance (computer science)Object (grammar)Band matrixSocial classGreatest elementCondition numberInstance (computer science)Auditory maskingSlide ruleDifferent (Kate Ryan album)Semantics (computer science)1 (number)BuildingRight angleProcess (computing)Object (grammar)Type theorySocial classBoolean algebraMedical imagingComputer animation
03:54
Instance (computer science)BuildingComputer-generated imageryPixelSoftware testingLevel (video gaming)WaveBuildingMetreWave packetMedical imagingProcess (computing)CASE <Informatik>InformationSocial classArithmetic meanImage resolutionForcing (mathematics)SatelliteAreaPoint (geometry)Different (Kate Ryan album)PRINCE2PixelNationale Forschungseinrichtung für Informatik und AutomatikFrequencyMotion captureComplete metric space
06:06
DatabaseComputer-generated imageryoutputProcess (computing)Query languageRaster graphicsLink (knot theory)Open setDatabaseInformationMedical imagingLevel (video gaming)BuildingClassical physicsAreaRoundness (object)PRINCE2Proof theoryNumberTesselation
07:15
Link (knot theory)Computer-generated imageryRaster graphicsRight angleProof theoryOpen setLevel (video gaming)Nationale Forschungseinrichtung für Informatik und AutomatikBasis <Mathematik>Medical imagingRevision controlBuildingWeightHypermediaFreezingGroup actionMeeting/InterviewComputer animation
08:09
Medical imagingWave packetSet (mathematics)Proof theoryParsingComputer animation
08:31
Software testingParsingMedical imagingCASE <Informatik>Software testingSet (mathematics)Level (video gaming)Wave packetVirtual machineNumbering schemeKeyboard shortcut1 (number)Computer animation
09:16
Computer-generated imagerySinguläres IntegralPreprocessorPixelParameter (computer programming)HypercubeComputer hardwareGraphics processing unitBit rateData modelHypercubeSet (mathematics)Reference dataTransport Layer SecurityPlanningMultiplication signPoint (geometry)TesselationImage resolutionEndliche ModelltheorieForestTerm (mathematics)Semiconductor memoryTunisMedical imagingComputer configurationHypermediaComputer hardwareMachine learningXML
10:47
PixelGraphics processing unitComputer hardwareComputer-generated imageryBit rateData modelBefehlsprozessorTerm (mathematics)CodeComputerComputer hardwareMultiplication signCASE <Informatik>Point (geometry)Computer fileStress (mechanics)Building
11:50
Data modelPredictionAsynchronous Transfer ModeComputer-generated imageryTable (information)outputFunction (mathematics)InferenceProcess (computing)Object (grammar)Boolean algebraPolygonActive contour modelPixelSinguläres IntegralOpen setGeometryShape (magazine)Table (information)BuildingPixelCanonical ensembleEndliche ModelltheoriePoint (geometry)Coordinate systemInferencePolygonPredictabilityLibrary (computing)Computer fileWave packetFile formatWeightFigurate numberImage processingMereologyGame controllerLine (geometry)Process (computing)Stability theoryReference dataJSON
13:39
PredictionSinguläres IntegralComputer-generated imageryVisualization (computer graphics)PredictabilityResultantBuildingSocial classPresentation of a groupProduct (business)WebsiteProcess (computing)Multiplication signPoint (geometry)Medical imagingMereologyScaling (geometry)Task (computing)Term (mathematics)Function (mathematics)Right angleSlide rulePolygonDifferent (Kate Ryan album)TesselationAxiom of choiceBus (computing)Image resolutionDemosceneCellular automatonoutputPixelPattern recognitionInformationCheat <Computerspiel>Computer animation
16:17
Perspective (visual)Proof theoryData modelPlug-in (computing)Projective planeState of matterExistenceMereologyPresentation of a groupEndliche ModelltheorieObject (grammar)ResultantProof theoryForcing (mathematics)Computer animation
17:34
TrailDemo (music)Random numberComputer-generated imageryPredictionDuality (mathematics)Web pageProgrammable read-only memoryElectronic visual displayPersonal digital assistantResultantTheoryMultiplication signSelf-organizationWeb 2.0Social classCartesian coordinate systemQuery languageSet (mathematics)Different (Kate Ryan album)Noise (electronics)CASE <Informatik>Process (computing)Software testingBuildingElectronic visual displayXMLComputer animation
18:57
Computer-generated imageryBlogObservational studyRepository (publishing)Cartesian coordinate systemComputer animation
19:22
Medical imagingElectric generatorBuildingMetropolitan area networkRight angleEndliche ModelltheorieMetric systemProcess (computing)Computer configurationTable (information)Multiplication signPoint (geometry)MereologyTesselationVirtual machineWave packetTrailVisualization (computer graphics)Maxima and minimaResultantoutputData structureMenu (computing)Auditory maskingBitTask (computing)Forcing (mathematics)Goodness of fitSpeciesInstance (computer science)Computer virusRaw image formatSource codePreprocessorFigurate numberSelectivity (electronic)HypothesisModule (mathematics)Set (mathematics)Boolean algebraRaster graphicsRevision controlMoment (mathematics)Vector spaceLecture/Conference
Transcript: English(auto-generated)
00:11
Hello everyone. The room is quite crowded. That's a good thing, I guess. Thank you for being here.
00:20
I'm Rafael De Lom. I work at Auslandia, which is a small company specialised in geospatial data treatment and analysis. I will present you some works that we did in the last two years about geospatial data processing.
00:40
We won't talk about how to improve a state-of-the-art algorithm. This is more a matter of designing a proof-of-concept in an eight-and-run process. I will detail all these elements.
01:00
Just to begin, as I said, I am working at Auslandia as an R&D engineer. As all Auslandia teammates, I am working on geospatial data solutions. During these past two years, I worked a lot on AI-related algorithms. That's part of what will be presented today.
01:26
Auslandia is not an artificial intelligence-focused company. As I said just before, we are more a geospatial data-related company. As you might know, we have a lot of democratisation of aerial images,
01:46
aerial and satellite images. The fact is that this is a very interesting use case, just to try and make some fast results. We merely focus on the building footprint detection use case.
02:02
The point is that you have an aerial image and you want to just discover where the houses and the buildings are on the map. You have typical boolean mask as on the right. You have building in white and background in black. How to mix up these two concepts, deep learning and geospatial data?
02:26
At Auslandia, we use a lot of Python and all the related Python libraries. We consider two main types of algorithms, which are segmented segmentation and instant segmentation. I won't detail all of the use cases now,
02:43
but I can provide a more detailed explanation. Imagine you have an aerial image on the left. You have some buildings on it. If you want to process semantic segmentation,
03:05
you will produce as many boolean masks as you have classes. On the middle, you have a semantic segmentation typical mask. You have a first boolean mask with a complete building. Then on the top right, you have two buildings that can be made as incomplete.
03:24
The last mask at the bottom, which is a foundation. We will just differentiate the building types. At the opposite, if you are doing instant segmentation, you will build as many masks as you have objects.
03:45
You have one mask for the building on the top right, one mask for the building at the bottom right, and so on. That's two different problems. In our works, we considered two interesting datasets.
04:02
The first one is very close to the FOSS4G scope because it was released maybe not during, but in the period of the last FOSS4G conference in Dar es Salaam in Tanzania. The aim of this dataset was to identify all the available buildings on a map
04:24
on a typical aerial image by differentiating three classes of building. We had complete building, unfinished building, and foundations. The fact that we have very high resolution image
04:41
makes no need to have plenty of image. There was only 13 level image, but with as many pixels the dataset can be considered as really big. The second interesting dataset that we focused on
05:02
was the aerial image dataset, which was released by a French research institute, which is called INRIA. In this dataset, we had 360 images, half of which were for training.
05:20
In such a dataset, there is no distinction of building classes. We are just trying to identify the building footprints. In the first case, the resolution was in mean 7 cm per pixel.
05:43
In the second case, it was around 30 cm per pixel. The resolution is far more precise than in the satellite image case. The data capture was done with drones.
06:08
There is an interesting point with such datasets. We are trying to consider building footprints, but if we just think about it, we can find this information with another dataset.
06:25
That is open street map data. This open street map data can be really close to the INRIA dataset. We tried to rebuild some levels to find again the building footprints
06:40
by leaving the aerial image dataset and by building it from the open street map database. By using some classical geotools like JDAL to get the image coordinates, with Overpass to query the open street map data, then by storing the data into a database, by generating raster tiles,
07:05
if we want to get the open street map building footprints as images, we can just detail all these steps and rebuild the levels. We did a very small proof of concept.
07:23
As you might see, the open street map levels are very close to the original datasets, in such a way that they probably use it as a basis. On the left, you have a typical INRIA image. On the middle, this is the labeled version of the image provided by INRIA.
07:48
On the right, that is the version that we produced by our own way. The building and the images are not necessarily equivalent,
08:02
because there is probably work on open street map tags to recover the exact footprints. That's it for the dataset we used. Now, let's talk about some steps. When we are trying to design a proof of concept and to design the data pipeline,
08:25
in this topic, we have to just consider some classical steps, the first of which being the data parsing. You might be very aware of that. We have a first set, which is a training set.
08:43
We add a bunch of geo-referenced images with non-coordinates. These images can be accompanied by geo-json levels for having the buildings, or another solution is to provide the level as images.
09:01
That was the case for the INRIA datasets. You also need a testing set with images that don't have to be used for training. That's basically a typical machine learning scheme. Then we get some geo-referenced data.
09:24
The first thing to do when you are dealing with a deep learning algorithm, because of the requirements in terms of hardware and memory related steps, is to tile the big image you have.
09:42
You can't fit a machine learning and deep learning algorithm with a very high resolution image as we did at the beginning. We just tiled our images with a fixed size, and you have several options to do that. The first one is JDAL with the JDAL command.
10:05
Or you can be more Python focused by using NumPy and just work on the image data. Here you have some images as little or smaller tiles.
10:21
You can train your model. There are plenty of interesting questions in this topic. The first one, when we arrived at this point, was really new for us. You just have to design a lot of settings in terms of hyperparameters.
10:43
There are a lot of things to tune, but which tuning is the best? That's a very challenging question in this area. Then we firstly arrived into a kind of wall that was the hardware stuff.
11:03
You can do this with only CPU, but you have to be very patient. So that's probably not the best idea you will have in this topic. You can use AWS code and so on.
11:22
In our case, we had an old computer with a nice graphical card, and we just said, oh yeah, that's a GPU, let's use it. We were very, very better in terms of computation time with just one GPU.
11:40
For our needs, that was just sufficient. But clearly, do not do that with only CPUs. I talked about training. At this point, we have tiles. We have the items, the buildings, stored as geoJSON files. You can train a model like HDF files.
12:04
We used Keras, that's a typical file format that you can find in this library. And then the nice stuff, the model inference. With your model, and you can see that with a bunch of Python lines,
12:21
you can just load your model, set your train weight into the model, and do your predictions. At this step, you do not have buildings, you just have a lot of tables with a lot of figures into them. So that's pretty classical here again.
12:44
But what is actually interesting to us, it's not tables of figures, it's more buildings. We have to just consider again the geotools to transform the table of figures into buildings. The first very challenging part was to just rebuild the polygon controls.
13:09
So we used OpenCV, which is a pretty canonical solution in image processing. Then one of the easiest points was to transform the pixel into geographical coordinates.
13:24
As we have geo-referenced data, that's pretty easy to do. And then we can just consider the Python library to build buildings as polygons.
13:40
In terms of results, I like these slides because it says something, but there are a lot of challenges that are not answered there. We have an input image, which is basically larger than just a single tile.
14:00
So you can try to predict some buildings. In green, for example, you have the complete class. In yellow, you have the incomplete class. And when you run just a simple segmentation model, you're just doing a prediction at the pixel scales. So if you predict all your pixels, the same building can be composed of different classes.
14:27
So that's not what you want. So either you use a smarter model or you can just use a post-processing step. And that's the choice we made. So in the post-processing step, we just considered the output of the deep learning method at the top right.
14:48
And then we build polygons, simplified polygons, by choosing a class. So we can see that in terms of pure results, that's not so smart.
15:03
But let's say it's a good idea of what can be the building on this image. There is another really challenging task. It is the recognition of all the tiles.
15:21
So we are doing predictions on tiles. But you have to imagine that your high resolution image is composed of hundreds of tiles. So the point is how to just recombine all this information. In this example, I kind of cheat because the building border is not on a building, just like a few pixels.
15:54
But when you're doing predictions, you have to imagine that one part of a building can be predicted in one tile,
16:01
and the other part of the building will be predicted in the tile just beneath it. So you have to be very rigorous to reconstitute, to rebuild your buildings. This presentation was merely an insight of what can be done when you are curious,
16:27
when you want to design a pipeline, and when you want to get some fast results. We learn a lot of things in this aspect. We design our proof of concept.
16:42
Our results are not so good. We were just shortly at the state of the art when we focused on model, but we didn't work on the model part of the project for six months. So I'm pretty sure that today we are not at the state of the art.
17:01
We have to continue all the efforts to stay at the state of the art. Our objective for the future won't be to keep the best results, but we will be interested in designing a QGIS plugin to introduce this kind of work into QGIS
17:29
and to provide an alternative solution to the existence. I don't know if I have time. I have a little demonstration, but one minute?
17:42
Okay, I will try it. We are confident with Flask web applications, and we just designed one for just showing results, and we just worked on a few data sets.
18:02
There is a query data set too. For example, on the Tanzania data sets, you can just be curious, just have an insight of the results. Disclaimer, they are not always good.
18:22
That's an interesting case we are finding. You can just play with it, try to predict the difference. This example is interesting because we find the building, but we do not post-process the results,
18:42
so we still have some noise data between the classes. So that was this application. If you are very curious, you can test it yourself, but let's keep it for another day.
19:03
You can find more on our blog. We have a bunch of GitHub repositories if you want to check what we did. The data application is at data.oslandia.io for this application.
19:20
Thank you for your attention. So, question time. One question over there. Since your tiles have to be a certain maximum size,
19:43
do you rescale your input image so that you are generally certain that your structures will fit within the structures you're looking for to avoid having many that cross tiles? In the training time, we just randomly select X and Y coordinates
20:03
to get a lot of images. So we made the hypothesis that the uniform generation allows us to just consider all the raw image. In the past, we tried to rescale the images,
20:21
but it adds some noisy behaviors, so we just crop tiles. We do not resize them anymore. Do you have any performance metrics for how the model is performed? Yeah, we measured it for training and validation,
20:43
so we used accuracy, but the accuracy was not necessarily interesting. We have very good accuracy score, but visually, the result was not so good. We tried to implement the intersection of a reunion,
21:03
but it's an ongoing work. It's not finished, so we can't give any figures for that. Cool, thanks. Hello. So I have two short questions. One, you were mentioning that you have vectors as labeled data
21:23
that you use for ground truth, and you also mentioned that you have rasters. So one of the questions would be, how did you automate the labeling if you needed to do the manual labeling and create the labeled rasters? How do you automate this if you have thousands, hundreds of thousands of tiles?
21:40
Okay, the point is, a good part of the source code is dealing with just data sets pre-processing, data sets gathering and pre-processing. So we just as specified modules for each data sets, and in the end,
22:02
the labels, they are just tables of zero and one. So when you have images, that's pretty fast. When you have a vectorized label, you have to just do the opposite work
22:22
that I present for this process. You have just to transform your buildings into a Boolean mask. All right, so just to explain a bit, my question related to the fact that if you're doing agronomy, for instance, you have no ground truth, so you need to manually label the data.
22:42
So you need to have someone actually paint the pixels and give you the raster back as ground truth. So this is a bit where I was aiming. And second question, once you create the tiles for a particular image, how do you handle the,
23:03
how do you keep track of which labels go to which tiles and whether or not you want to use them later on and retrain the model if needed? Okay, the fact is the image are georeferenced, so we have their coordinates, but we have the coordinates of the label too,
23:22
and we just as to process the image in the same time than the labels. So if you choose a random X and a random Y, you crop your image, your raw image, considering these coordinates, but you have to do the same exact thing
23:41
for the label version of the image. And if it's vectorized, if it's vectorized information, you are forced to transform it as a table, as an MPI table at the moment. So you will crop it with the same coordinates,
24:04
the same exact way. I can't know what I can add.
Recommendations
Series of 21 media