We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Geospatial data processing for image automatic analysis

00:00

Formal Metadata

Title
Geospatial data processing for image automatic analysis
Title of Series
Number of Parts
295
Author
Contributors
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Deep learning algorithms appear as a major breakthrough in GIS scope: neural networks are able to do semantic segmentation on aerial images, so as to identify building footprints, roads, and so on. Oslandia is an opensource company studying and exploiting geospatial data, with an extensive R&D activity about geospatial data science. This presentation will detail some of our Python routines in terms of geospatial data handling. We will describe our processes from raw data to prediction results. As the main step of the pipeline, machine learning techniques (e.g. convolutional neural networks for image semantic segmentation with Keras) produce valuable predictions. In the case of geospatial data, a postprocessing step is often necessary for displaying the results in web applications and GIS tools. A concrete illustration of our results will be provided through a light Flask application designed for demonstration purpose.
Keywords
129
131
137
139
Thumbnail
28:17
Multiplication signElectronic data processingMathematical analysisProof theoryState of matterOptical disc driveDemosceneSpeciesGoodness of fitMeeting/Interview
Process (computing)Mathematical analysisComputer-generated imageryProcess (computing)AlgorithmElement (mathematics)Inheritance (object-oriented programming)Computer animation
AlgorithmAlgorithmMereology
Artificial intelligenceComputer-generated imageryBuildingContext awarenessPoint (geometry)Right angleBuildingCASE <Informatik>Medical imagingFocus (optics)SatelliteResultantAuditory maskingBoolean algebraComputer animation
Instance (computer science)Computer-generated imageryPersonal digital assistantMathematical analysisParsingStack (abstract data type)Library (computing)Type theoryMusical ensembleInternet service providerCASE <Informatik>Theory of relativity
Computer-generated imageryInstance (computer science)Object (grammar)Band matrixSocial classGreatest elementCondition numberInstance (computer science)Auditory maskingSlide ruleDifferent (Kate Ryan album)Semantics (computer science)1 (number)BuildingRight angleProcess (computing)Object (grammar)Type theorySocial classBoolean algebraMedical imagingComputer animation
Instance (computer science)BuildingComputer-generated imageryPixelSoftware testingLevel (video gaming)WaveBuildingMetreWave packetMedical imagingProcess (computing)CASE <Informatik>InformationSocial classArithmetic meanImage resolutionForcing (mathematics)SatelliteAreaPoint (geometry)Different (Kate Ryan album)PRINCE2PixelNationale Forschungseinrichtung für Informatik und AutomatikFrequencyMotion captureComplete metric space
DatabaseComputer-generated imageryoutputProcess (computing)Query languageRaster graphicsLink (knot theory)Open setDatabaseInformationMedical imagingLevel (video gaming)BuildingClassical physicsAreaRoundness (object)PRINCE2Proof theoryNumberTesselation
Link (knot theory)Computer-generated imageryRaster graphicsRight angleProof theoryOpen setLevel (video gaming)Nationale Forschungseinrichtung für Informatik und AutomatikBasis <Mathematik>Medical imagingRevision controlBuildingWeightHypermediaFreezingGroup actionMeeting/InterviewComputer animation
Medical imagingWave packetSet (mathematics)Proof theoryParsingComputer animation
Software testingParsingMedical imagingCASE <Informatik>Software testingSet (mathematics)Level (video gaming)Wave packetVirtual machineNumbering schemeKeyboard shortcut1 (number)Computer animation
Computer-generated imagerySinguläres IntegralPreprocessorPixelParameter (computer programming)HypercubeComputer hardwareGraphics processing unitBit rateData modelHypercubeSet (mathematics)Reference dataTransport Layer SecurityPlanningMultiplication signPoint (geometry)TesselationImage resolutionEndliche ModelltheorieForestTerm (mathematics)Semiconductor memoryTunisMedical imagingComputer configurationHypermediaComputer hardwareMachine learningXML
PixelGraphics processing unitComputer hardwareComputer-generated imageryBit rateData modelBefehlsprozessorTerm (mathematics)CodeComputerComputer hardwareMultiplication signCASE <Informatik>Point (geometry)Computer fileStress (mechanics)Building
Data modelPredictionAsynchronous Transfer ModeComputer-generated imageryTable (information)outputFunction (mathematics)InferenceProcess (computing)Object (grammar)Boolean algebraPolygonActive contour modelPixelSinguläres IntegralOpen setGeometryShape (magazine)Table (information)BuildingPixelCanonical ensembleEndliche ModelltheoriePoint (geometry)Coordinate systemInferencePolygonPredictabilityLibrary (computing)Computer fileWave packetFile formatWeightFigurate numberImage processingMereologyGame controllerLine (geometry)Process (computing)Stability theoryReference dataJSON
PredictionSinguläres IntegralComputer-generated imageryVisualization (computer graphics)PredictabilityResultantBuildingSocial classPresentation of a groupProduct (business)WebsiteProcess (computing)Multiplication signPoint (geometry)Medical imagingMereologyScaling (geometry)Task (computing)Term (mathematics)Function (mathematics)Right angleSlide rulePolygonDifferent (Kate Ryan album)TesselationAxiom of choiceBus (computing)Image resolutionDemosceneCellular automatonoutputPixelPattern recognitionInformationCheat <Computerspiel>Computer animation
Perspective (visual)Proof theoryData modelPlug-in (computing)Projective planeState of matterExistenceMereologyPresentation of a groupEndliche ModelltheorieObject (grammar)ResultantProof theoryForcing (mathematics)Computer animation
TrailDemo (music)Random numberComputer-generated imageryPredictionDuality (mathematics)Web pageProgrammable read-only memoryElectronic visual displayPersonal digital assistantResultantTheoryMultiplication signSelf-organizationWeb 2.0Social classCartesian coordinate systemQuery languageSet (mathematics)Different (Kate Ryan album)Noise (electronics)CASE <Informatik>Process (computing)Software testingBuildingElectronic visual displayXMLComputer animation
Computer-generated imageryBlogObservational studyRepository (publishing)Cartesian coordinate systemComputer animation
Medical imagingElectric generatorBuildingMetropolitan area networkRight angleEndliche ModelltheorieMetric systemProcess (computing)Computer configurationTable (information)Multiplication signPoint (geometry)MereologyTesselationVirtual machineWave packetTrailVisualization (computer graphics)Maxima and minimaResultantoutputData structureMenu (computing)Auditory maskingBitTask (computing)Forcing (mathematics)Goodness of fitSpeciesInstance (computer science)Computer virusRaw image formatSource codePreprocessorFigurate numberSelectivity (electronic)HypothesisModule (mathematics)Set (mathematics)Boolean algebraRaster graphicsRevision controlMoment (mathematics)Vector spaceLecture/Conference
Transcript: English(auto-generated)
Hello everyone. The room is quite crowded. That's a good thing, I guess. Thank you for being here.
I'm Rafael De Lom. I work at Auslandia, which is a small company specialised in geospatial data treatment and analysis. I will present you some works that we did in the last two years about geospatial data processing.
We won't talk about how to improve a state-of-the-art algorithm. This is more a matter of designing a proof-of-concept in an eight-and-run process. I will detail all these elements.
Just to begin, as I said, I am working at Auslandia as an R&D engineer. As all Auslandia teammates, I am working on geospatial data solutions. During these past two years, I worked a lot on AI-related algorithms. That's part of what will be presented today.
Auslandia is not an artificial intelligence-focused company. As I said just before, we are more a geospatial data-related company. As you might know, we have a lot of democratisation of aerial images,
aerial and satellite images. The fact is that this is a very interesting use case, just to try and make some fast results. We merely focus on the building footprint detection use case.
The point is that you have an aerial image and you want to just discover where the houses and the buildings are on the map. You have typical boolean mask as on the right. You have building in white and background in black. How to mix up these two concepts, deep learning and geospatial data?
At Auslandia, we use a lot of Python and all the related Python libraries. We consider two main types of algorithms, which are segmented segmentation and instant segmentation. I won't detail all of the use cases now,
but I can provide a more detailed explanation. Imagine you have an aerial image on the left. You have some buildings on it. If you want to process semantic segmentation,
you will produce as many boolean masks as you have classes. On the middle, you have a semantic segmentation typical mask. You have a first boolean mask with a complete building. Then on the top right, you have two buildings that can be made as incomplete.
The last mask at the bottom, which is a foundation. We will just differentiate the building types. At the opposite, if you are doing instant segmentation, you will build as many masks as you have objects.
You have one mask for the building on the top right, one mask for the building at the bottom right, and so on. That's two different problems. In our works, we considered two interesting datasets.
The first one is very close to the FOSS4G scope because it was released maybe not during, but in the period of the last FOSS4G conference in Dar es Salaam in Tanzania. The aim of this dataset was to identify all the available buildings on a map
on a typical aerial image by differentiating three classes of building. We had complete building, unfinished building, and foundations. The fact that we have very high resolution image
makes no need to have plenty of image. There was only 13 level image, but with as many pixels the dataset can be considered as really big. The second interesting dataset that we focused on
was the aerial image dataset, which was released by a French research institute, which is called INRIA. In this dataset, we had 360 images, half of which were for training.
In such a dataset, there is no distinction of building classes. We are just trying to identify the building footprints. In the first case, the resolution was in mean 7 cm per pixel.
In the second case, it was around 30 cm per pixel. The resolution is far more precise than in the satellite image case. The data capture was done with drones.
There is an interesting point with such datasets. We are trying to consider building footprints, but if we just think about it, we can find this information with another dataset.
That is open street map data. This open street map data can be really close to the INRIA dataset. We tried to rebuild some levels to find again the building footprints
by leaving the aerial image dataset and by building it from the open street map database. By using some classical geotools like JDAL to get the image coordinates, with Overpass to query the open street map data, then by storing the data into a database, by generating raster tiles,
if we want to get the open street map building footprints as images, we can just detail all these steps and rebuild the levels. We did a very small proof of concept.
As you might see, the open street map levels are very close to the original datasets, in such a way that they probably use it as a basis. On the left, you have a typical INRIA image. On the middle, this is the labeled version of the image provided by INRIA.
On the right, that is the version that we produced by our own way. The building and the images are not necessarily equivalent,
because there is probably work on open street map tags to recover the exact footprints. That's it for the dataset we used. Now, let's talk about some steps. When we are trying to design a proof of concept and to design the data pipeline,
in this topic, we have to just consider some classical steps, the first of which being the data parsing. You might be very aware of that. We have a first set, which is a training set.
We add a bunch of geo-referenced images with non-coordinates. These images can be accompanied by geo-json levels for having the buildings, or another solution is to provide the level as images.
That was the case for the INRIA datasets. You also need a testing set with images that don't have to be used for training. That's basically a typical machine learning scheme. Then we get some geo-referenced data.
The first thing to do when you are dealing with a deep learning algorithm, because of the requirements in terms of hardware and memory related steps, is to tile the big image you have.
You can't fit a machine learning and deep learning algorithm with a very high resolution image as we did at the beginning. We just tiled our images with a fixed size, and you have several options to do that. The first one is JDAL with the JDAL command.
Or you can be more Python focused by using NumPy and just work on the image data. Here you have some images as little or smaller tiles.
You can train your model. There are plenty of interesting questions in this topic. The first one, when we arrived at this point, was really new for us. You just have to design a lot of settings in terms of hyperparameters.
There are a lot of things to tune, but which tuning is the best? That's a very challenging question in this area. Then we firstly arrived into a kind of wall that was the hardware stuff.
You can do this with only CPU, but you have to be very patient. So that's probably not the best idea you will have in this topic. You can use AWS code and so on.
In our case, we had an old computer with a nice graphical card, and we just said, oh yeah, that's a GPU, let's use it. We were very, very better in terms of computation time with just one GPU.
For our needs, that was just sufficient. But clearly, do not do that with only CPUs. I talked about training. At this point, we have tiles. We have the items, the buildings, stored as geoJSON files. You can train a model like HDF files.
We used Keras, that's a typical file format that you can find in this library. And then the nice stuff, the model inference. With your model, and you can see that with a bunch of Python lines,
you can just load your model, set your train weight into the model, and do your predictions. At this step, you do not have buildings, you just have a lot of tables with a lot of figures into them. So that's pretty classical here again.
But what is actually interesting to us, it's not tables of figures, it's more buildings. We have to just consider again the geotools to transform the table of figures into buildings. The first very challenging part was to just rebuild the polygon controls.
So we used OpenCV, which is a pretty canonical solution in image processing. Then one of the easiest points was to transform the pixel into geographical coordinates.
As we have geo-referenced data, that's pretty easy to do. And then we can just consider the Python library to build buildings as polygons.
In terms of results, I like these slides because it says something, but there are a lot of challenges that are not answered there. We have an input image, which is basically larger than just a single tile.
So you can try to predict some buildings. In green, for example, you have the complete class. In yellow, you have the incomplete class. And when you run just a simple segmentation model, you're just doing a prediction at the pixel scales. So if you predict all your pixels, the same building can be composed of different classes.
So that's not what you want. So either you use a smarter model or you can just use a post-processing step. And that's the choice we made. So in the post-processing step, we just considered the output of the deep learning method at the top right.
And then we build polygons, simplified polygons, by choosing a class. So we can see that in terms of pure results, that's not so smart.
But let's say it's a good idea of what can be the building on this image. There is another really challenging task. It is the recognition of all the tiles.
So we are doing predictions on tiles. But you have to imagine that your high resolution image is composed of hundreds of tiles. So the point is how to just recombine all this information. In this example, I kind of cheat because the building border is not on a building, just like a few pixels.
But when you're doing predictions, you have to imagine that one part of a building can be predicted in one tile,
and the other part of the building will be predicted in the tile just beneath it. So you have to be very rigorous to reconstitute, to rebuild your buildings. This presentation was merely an insight of what can be done when you are curious,
when you want to design a pipeline, and when you want to get some fast results. We learn a lot of things in this aspect. We design our proof of concept.
Our results are not so good. We were just shortly at the state of the art when we focused on model, but we didn't work on the model part of the project for six months. So I'm pretty sure that today we are not at the state of the art.
We have to continue all the efforts to stay at the state of the art. Our objective for the future won't be to keep the best results, but we will be interested in designing a QGIS plugin to introduce this kind of work into QGIS
and to provide an alternative solution to the existence. I don't know if I have time. I have a little demonstration, but one minute?
Okay, I will try it. We are confident with Flask web applications, and we just designed one for just showing results, and we just worked on a few data sets.
There is a query data set too. For example, on the Tanzania data sets, you can just be curious, just have an insight of the results. Disclaimer, they are not always good.
That's an interesting case we are finding. You can just play with it, try to predict the difference. This example is interesting because we find the building, but we do not post-process the results,
so we still have some noise data between the classes. So that was this application. If you are very curious, you can test it yourself, but let's keep it for another day.
You can find more on our blog. We have a bunch of GitHub repositories if you want to check what we did. The data application is at data.oslandia.io for this application.
Thank you for your attention. So, question time. One question over there. Since your tiles have to be a certain maximum size,
do you rescale your input image so that you are generally certain that your structures will fit within the structures you're looking for to avoid having many that cross tiles? In the training time, we just randomly select X and Y coordinates
to get a lot of images. So we made the hypothesis that the uniform generation allows us to just consider all the raw image. In the past, we tried to rescale the images,
but it adds some noisy behaviors, so we just crop tiles. We do not resize them anymore. Do you have any performance metrics for how the model is performed? Yeah, we measured it for training and validation,
so we used accuracy, but the accuracy was not necessarily interesting. We have very good accuracy score, but visually, the result was not so good. We tried to implement the intersection of a reunion,
but it's an ongoing work. It's not finished, so we can't give any figures for that. Cool, thanks. Hello. So I have two short questions. One, you were mentioning that you have vectors as labeled data
that you use for ground truth, and you also mentioned that you have rasters. So one of the questions would be, how did you automate the labeling if you needed to do the manual labeling and create the labeled rasters? How do you automate this if you have thousands, hundreds of thousands of tiles?
Okay, the point is, a good part of the source code is dealing with just data sets pre-processing, data sets gathering and pre-processing. So we just as specified modules for each data sets, and in the end,
the labels, they are just tables of zero and one. So when you have images, that's pretty fast. When you have a vectorized label, you have to just do the opposite work
that I present for this process. You have just to transform your buildings into a Boolean mask. All right, so just to explain a bit, my question related to the fact that if you're doing agronomy, for instance, you have no ground truth, so you need to manually label the data.
So you need to have someone actually paint the pixels and give you the raster back as ground truth. So this is a bit where I was aiming. And second question, once you create the tiles for a particular image, how do you handle the,
how do you keep track of which labels go to which tiles and whether or not you want to use them later on and retrain the model if needed? Okay, the fact is the image are georeferenced, so we have their coordinates, but we have the coordinates of the label too,
and we just as to process the image in the same time than the labels. So if you choose a random X and a random Y, you crop your image, your raw image, considering these coordinates, but you have to do the same exact thing
for the label version of the image. And if it's vectorized, if it's vectorized information, you are forced to transform it as a table, as an MPI table at the moment. So you will crop it with the same coordinates,
the same exact way. I can't know what I can add.