Building Footprint Extraction in Vector Format Using pytorch_segmentation_models_trainer, QGIS Plugin DeepLearningTools and The Brazilian Army Geographic Service Building Dataset
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 351 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/68985 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Production Year | 2022 |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
FOSS4G Firenze 2022257 / 351
1
7
13
22
25
31
33
36
39
41
43
44
46
52
53
55
58
59
60
76
80
93
98
104
108
127
128
133
135
141
142
143
150
151
168
173
176
178
190
196
200
201
202
204
211
219
225
226
236
242
251
258
263
270
284
285
292
00:00
Steady state (chemistry)BuildingPresentation of a groupMathematical modelAudio file formatPlug-in (computing)Service-oriented architectureMathematical modelArmComputer animation
00:15
Maß <Mathematik>BuildingService-oriented architectureArtificial neural networkSource codePolygonArray data structureScale (map)Image resolutionOpen sourceExecution unitPolygonService-oriented architectureSound effectMassScaling (geometry)ConvolutionBuildingWave packetInformationOffice suiteProduct (business)Dependent and independent variablesImage resolutionArray data structureUniverse (mathematics)Computer animation
01:13
AreaSet (mathematics)Computer animation
01:33
Angular resolutionImage resolutionRadiometrySpectrum (functional analysis)BuildingComputer-generated imageryDemoscenePixelService-oriented architecturePixelBitAngular resolutionSoftware testingWave packetState of matterBuildingImage resolutionSpectrum (functional analysis)Medical imagingDemosceneSet (mathematics)Service-oriented architectureMetric systemComputer animation
02:30
Right angleDemosceneSet (mathematics)Computer animation
02:52
Dirac equationRandelemente-MethodeDiscrete element methodCohen's kappaGotcha <Informatik>Identity managementCASE <Informatik>Array data structureMedical imagingAuditory maskingComputer animation
03:05
Advanced Encryption StandardAuditory maskingComputer animation
03:16
BuildingComputer hardwareThread (computing)BefehlsprozessorMathematical modelVideoconferencingMathematical modelSoftware frameworkComputer fileConfiguration spaceProgrammschleifeImplementationHolographic data storageFamilyMobile WebMultiplicationArchitectureOptical character recognitionFrame problemField (computer science)Entropie <Informationstheorie>Active contour modelGradientInsertion lossGreen's functionAdvanced Encryption StandardTangentGlattheit <Mathematik>Module (mathematics)outputComputer-generated imageryPredictionPolygonInstance (computer science)Computer fontSynchronizationProcess (computing)CoprocessorDependent and independent variablesInferenceService-oriented architectureTime zoneUnified threat managementFunction (mathematics)Parameter (computer programming)Mixed realityConvolutionMetric systemMilitary operationStapeldateiLinear mapLine (geometry)Arithmetic meanAngleError messageWeightStochasticMathematical optimizationBit rateScheduling (computing)Smith chartHistogramHypercubePerformance appraisalAuditory maskingCombinational logicMathematical modelPolygonRandomizationMathematical modelService-oriented architectureWeightPerformance appraisalAsynchronous Transfer ModeOpen sourceComputer fileObject (grammar)Configuration spaceData conversionArchitectureMedical imagingWave packetVisualization (computer graphics)Server (computing)SoftwareBranch (computer science)Metric systemSoftware frameworkVirtual machineGraphics processing unitBuildingSemiconductor memoryBit rateLine (geometry)HistogramTangentLibrary (computing)Plug-in (computing)Arithmetic meanSimilarity (geometry)ImplementationAugmented realityGradientScheduling (computing)Optical character recognitionInsertion lossClient (computing)Mixed realityExecution unitCodeError messageField (computer science)Vector spaceStapeldateiAngleMaxima and minimaMatrix (mathematics)VideoconferencingValuation (algebra)Dependent and independent variablesMathematical optimizationParameter (computer programming)Mobile WebCycle (graph theory)Instance (computer science)CuboidFrame problemCase moddingProgrammschleifeLevel (video gaming)FamilyPlanningMereologyComplex (psychology)Coma BerenicesLinear mapSource codeAudio file formatComputer animationEngineering drawing
11:01
Visual systemOptical character recognitionField (computer science)Frame problemResultantSet (mathematics)Software frameworkPolygonOptical character recognitionField (computer science)Visualization (computer graphics)Object (grammar)Computer animation
11:43
Field (computer science)Frame problemOptical character recognitionVisual systemPeg solitaireBuildingResultantPolygonService-oriented architectureRight angleCase moddingAsynchronous Transfer ModeMedical imagingComputer animation
12:25
Frame problemField (computer science)Optical character recognitionOpen sourceImplementationMathematical modelBuildingSoftware testingRepository (publishing)Bounded variationSource codeCodierung <Programmierung>Machine visionSineComputer-generated imageryIntegrated development environmentTime domainShift operatorField (computer science)Mixed realityImplementationAreaReal numberProduct (business)Mathematical modelShift operatorIntegrated development environmentMachine visionWave packetSoftware testingSoftware frameworkLink (knot theory)PolygonOpen sourceTransformation (genetics)Set (mathematics)Bounded variationMathematical modelMedical imagingService-oriented architectureDifferent (Kate Ryan album)ResultantBackupAsynchronous Transfer ModeDomain nameSystem callBuildingComputer animation
14:24
Multiplication signComputer animation
Transcript: English(auto-generated)
00:00
Hello everyone, my name is Fikir Bohaba, and I'm here to present Building Footprint Extraction in VEXOR Format using PyTorch Segmentation Model Trainer, QGIS plugin, deep learning tools, and Brazilian Army Geographic Service Building Dataset. I'm an officer from the Brazilian Army Geographic Service, which is
00:21
one of the government institutions responsible for mapping the Brazilian territory. There are five subordinate units, the G-Information Centers, each in charge of mapping one region of Brazil. Annually, several officers are assigned to carry out postgraduate research in universities to try to solve production problems.
00:42
I have carried out my master's research on automatic building footprint extraction from remote sensing imagery. My research goals were to verify whether it is possible to train deep convolutional neural network capable of extracting building footprint vector polygons from very high resolution imagery compatible with 1 to 25,000 scale.
01:05
And also in this research, I tried to develop as much as I could open source solutions to solve the research problem. So I've used the dataset AI Crowd, which is composed mainly of urban areas.
01:23
It was used by Jared et al. and Zaw et al., both which were my base articles for my research. In the research, we've also built the Brazilian Army Geographic Service Building Dataset, which is composed by airborne photogrammetric imagery.
01:44
We have spatial resolution of 35 cm and 39 cm. The images are from the states of Uganda, Seoul, and South Asia. All images have regiometric resolution of 8 bits and spectral resolution RGB.
02:03
We've extracted manually more than 1.5 million building footprints from either urban and rural scenes. The dataset has more than 247,000 images, each with 512 pixels by 512 pixels.
02:23
There is no overlap between the training test splits, and the dataset is available online. Here are some examples of the dataset. This dataset has lots of rural scenes and some urban scenes that can be seen in the top right corner.
02:53
Here is some examples of the vectors in QGIS, as well as the images already split, and the masks built, so some examples.
03:17
In this research, we've used a machine learning server with three NVIDIA Tesla V100, each with 32 GB of visual memory.
03:31
The polygons were extracted using QGIS and stored in PostGIS, in Postgres. We've developed a Python package called PyTorch Segmentation Models Trainer, which was used to build the masks and also to train the models.
03:54
We've also developed a plugin called Deep Learning Tools to handle the built masks, and this
04:01
plugin also can consume some API services that receive images, and outputs the GeoJSON polygons infrared. So PyTorch Segmentation Models Trainer is a framework that was built on top of PyTorch and PyTorch Lightning.
04:28
It serves models using FastAPI. The models can be implemented using the Segmentation Models PyTorch, which is an open
04:41
source library that has several pre-trained weights and pre-implemented models and architectures. And this framework, the PyTorch Segmentation Models Trainer, uses Hydra configuration files.
05:01
These configuration files, they are in YAML format. It uses OmegaConf to perform object instantiation, and for instance, there is this model tag which has the target value, which is the Segmentation Models PyTorch unit object.
05:22
We have all the parameters, so PyTorch Segmentation Models Trainer auto-intensiates this model and uses it to train the models that are already implemented in the framework.
05:41
By using this, we can have some flexible code. You can use it to train your other models. You can even implement your own models, and it will not be a dependency of PyTorch Segmentation Models Trainer.
06:09
We use the base frameworks of PyTorch and PyTorch Lightning. PyTorch Lightning is a very flexible and high-level API built on top of PyTorch. It enables easy implementation of training loops, and it has out-of-the-box multi-GPU implementation.
06:28
Segmentation Models PyTorch has nine model architectures, such as UNET, UNET++, FPN, PSPnet, DBL3+, and so on. It has 119 backbones, such as those from ResNet family, the EfficientNet family, and MobileNet.
06:50
We also implemented some custom models that can be combined with backbones from PyTorch Segmentation Models.
07:01
Segmentation Models PyTorch, sorry. And these models were the FrameField model, the model polymapper, polygonal RNN, and the HRNet OCR. Particularly, the FrameField learning was one of the used models in this research.
07:24
It is composed by UNET with a ResNet-101 backbone, and it is coupled with a FrameField model, which learns a complex vector field.
07:41
And it has nine losses that each looks at a particular aspect of the building. And this FrameField object is used in the post-processing methods, which is better building edges. We also implemented the mode polymapper, which is a combination of an RNN that infers the polygons' vertexes.
08:15
It is combined with an object detection network.
08:21
We serve our models using FastAPI, so the service receives an image, and it outputs to the client the GeoJSON of each extracted polygon in the image.
08:42
Deep Learning Tools has a model that posts an image to the service, receives the GeoJSON response, and outputs it as a temporary memory layer in QGIS.
09:03
We carried out eight experiments using both AI-Crowd and Brazilian Army Geographic Service building datasets. In our experiments, we used the following hyperparameters and evaluation metrics. As our optimizer, we used the weighted atom with a weight decay of 0.001, which is called ultra-regularization.
09:31
We also used data augmentations to avoid overfeeding, such as random crop, random flips, and histogram jitter.
09:42
We also used one-cycle learning rate scheduler and gradient clipping to avoid vanishing and exploding gradients, especially in the RNN-based methods. We also used stochastic weight averaging in the last 80% epochs to have better conversions.
10:04
We also employed mixed precision training, which enabled us to have larger batches in our GPUs. We used HE initialization in convolutional-based branches and KME initialization in RNN-operation-based branches.
10:22
As the evaluation metrics, we chose the polygons and line segments, POLIS, which is a polygon similarity metric. When two objects have POLIS 0, it means that they are the same object, and the further these objects are from each other.
10:44
For instance, a very different object has a very large POLIS. We also used the mean max tangent error angle errors, MMTAE, and intersectional reunion and omission accesses.
11:03
Here are some results in the AI crowd dataset. We can see that, particularly in the proposed method of this research, which was the HRNet OCRW48 frame field, it has better visual results than the polymapper in this dataset.
11:28
Some other visual results, and we can see that the polymapper did not handle well large objects.
11:45
And in the Brazilian Army Geographic Service building dataset, we had a different result. Our proposed technique was outperformed by the mode polymapper, which produced sharper and better building edges.
12:03
That can be seen in this image, but the mode polymapper had lots of omissions. It did not detect several adjoining buildings, as we can see in the bottom right image.
12:26
And the conclusion, in the Brazilian Army Geographic Service building dataset, the mode polymapper was the best method. And in the AI crowd dataset, the HRNet OCRW48 frame field was the best method.
12:49
After some inquiry in the dissertation, we found out that this HR-based method worked better in densely built-up areas, while the mode polymapper worked better in sparse areas.
13:03
The open-source implementation of this research is available in the PyTorch Segmentation Models Trainer, and the dataset is also available online. And this research is available at the link.
13:23
All calls are based on free and open-source technologies, as future work will test the chosen methods on different areas from Brazil, and conduct a pilot project using the results of this research in a production environment, to assess whether it can be used in real-world cartographic production.
13:45
We also want to research mixing different datasets on training steps, and try some domain shift techniques. We also want to test variation of the frame field in the mode polymapper. In particular, we want to replace the backbones with vision transformers,
14:05
and check whether it will produce better results. We also want to test a new technique called deep snakes, and all these tests will hopefully integrate it in PyTorch Segmentation Models Trainer.
14:25
Thank you for your time. Are there any questions?