We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Building Footprint Extraction in Vector Format Using pytorch_segmentation_models_trainer, QGIS Plugin DeepLearningTools and The Brazilian Army Geographic Service Building Dataset

00:00

Formal Metadata

Title
Building Footprint Extraction in Vector Format Using pytorch_segmentation_models_trainer, QGIS Plugin DeepLearningTools and The Brazilian Army Geographic Service Building Dataset
Title of Series
Number of Parts
351
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production Year2022

Content Metadata

Subject Area
Genre
Abstract
Building footprint extraction is a popular and booming research field. Annually, several research papers are published showing deep learning semantic segmentation-based methods to perform this kind of automated feature extraction. Unfortunately, many of those papers do not have open-source implementations for public usage, making it difficult for other researchers to access those implementations. Having that in mind, we present DeepLearningTools and pytorch_segmentation_models_trainer. Both are openly available implementations of deep learning-based semantic segmentation. This way, we seek to strengthen the scientific community sharing our implementations. DeepLearningTools is a QGIS plugin that enables building and visualizing masks from vector data. Moreover, it allows the usage of inference web services published by pytorch_segmentation_models_trainer, creating a more feasible way for QGIS users to train Deep Learning Models. pytorch_segmentation_models_trainer (pytorch-smt) is a Python framework built with PyTorch, PyTorch-Lightning, Hydra, segmentation_models.pytorch, rasterio, and shapely. This implementation enables using YAML files to perform segmentation mask building, model training, and inference. In particular, it ships pre-trained models for building footprint extraction and post-processing implementations to obtain clean geometries. In addition, one can deploy an inference service built using FastAPI and use it in either web-based applications or a QGIS plugin like DeepLearningTools. ResNet-101 U-Net Frame Field, ResNet-101 DeepLabV3+ Frame Field, HRNet W48 OCR Frame Field, Modified PolyMapper (ModPolyMapper), and PolygonRNN are some of the models available in pytorch-smt. These models were trained using the Brazilian Army Geographic Service Building Dataset (BAGS Dataset), a newly available dataset built using aerial imagery from the Brazilian States of Rio Grande do Sul and Santa Catarina. Pytorch-smt also enables training object detection and instance segmentation tasks using concise training configuration. This way, considering the aforementioned, this talk presents the usage overview of both technologies and some demonstrations. Using metrics like precision, recall, and F1, we assess the results achieved by the implementations developed as a product of our research, showing that they have the potential to produce vector data more efficiently than manual acquisition methods. DeepLearningTools is available at the QGIS plugin repository, while pytorch_segmentation_models_trainer is available at Python Package Manager (pip). The Brazilian Army Geographic Service develops both solutions, making their codes available at GitHub - dsgoficial.
Keywords
202
Thumbnail
1:16:05
226
242
Steady state (chemistry)BuildingPresentation of a groupMathematical modelAudio file formatPlug-in (computing)Service-oriented architectureMathematical modelArmComputer animation
Maß <Mathematik>BuildingService-oriented architectureArtificial neural networkSource codePolygonArray data structureScale (map)Image resolutionOpen sourceExecution unitPolygonService-oriented architectureSound effectMassScaling (geometry)ConvolutionBuildingWave packetInformationOffice suiteProduct (business)Dependent and independent variablesImage resolutionArray data structureUniverse (mathematics)Computer animation
AreaSet (mathematics)Computer animation
Angular resolutionImage resolutionRadiometrySpectrum (functional analysis)BuildingComputer-generated imageryDemoscenePixelService-oriented architecturePixelBitAngular resolutionSoftware testingWave packetState of matterBuildingImage resolutionSpectrum (functional analysis)Medical imagingDemosceneSet (mathematics)Service-oriented architectureMetric systemComputer animation
Right angleDemosceneSet (mathematics)Computer animation
Dirac equationRandelemente-MethodeDiscrete element methodCohen's kappaGotcha <Informatik>Identity managementCASE <Informatik>Array data structureMedical imagingAuditory maskingComputer animation
Advanced Encryption StandardAuditory maskingComputer animation
BuildingComputer hardwareThread (computing)BefehlsprozessorMathematical modelVideoconferencingMathematical modelSoftware frameworkComputer fileConfiguration spaceProgrammschleifeImplementationHolographic data storageFamilyMobile WebMultiplicationArchitectureOptical character recognitionFrame problemField (computer science)Entropie <Informationstheorie>Active contour modelGradientInsertion lossGreen's functionAdvanced Encryption StandardTangentGlattheit <Mathematik>Module (mathematics)outputComputer-generated imageryPredictionPolygonInstance (computer science)Computer fontSynchronizationProcess (computing)CoprocessorDependent and independent variablesInferenceService-oriented architectureTime zoneUnified threat managementFunction (mathematics)Parameter (computer programming)Mixed realityConvolutionMetric systemMilitary operationStapeldateiLinear mapLine (geometry)Arithmetic meanAngleError messageWeightStochasticMathematical optimizationBit rateScheduling (computing)Smith chartHistogramHypercubePerformance appraisalAuditory maskingCombinational logicMathematical modelPolygonRandomizationMathematical modelService-oriented architectureWeightPerformance appraisalAsynchronous Transfer ModeOpen sourceComputer fileObject (grammar)Configuration spaceData conversionArchitectureMedical imagingWave packetVisualization (computer graphics)Server (computing)SoftwareBranch (computer science)Metric systemSoftware frameworkVirtual machineGraphics processing unitBuildingSemiconductor memoryBit rateLine (geometry)HistogramTangentLibrary (computing)Plug-in (computing)Arithmetic meanSimilarity (geometry)ImplementationAugmented realityGradientScheduling (computing)Optical character recognitionInsertion lossClient (computing)Mixed realityExecution unitCodeError messageField (computer science)Vector spaceStapeldateiAngleMaxima and minimaMatrix (mathematics)VideoconferencingValuation (algebra)Dependent and independent variablesMathematical optimizationParameter (computer programming)Mobile WebCycle (graph theory)Instance (computer science)CuboidFrame problemCase moddingProgrammschleifeLevel (video gaming)FamilyPlanningMereologyComplex (psychology)Coma BerenicesLinear mapSource codeAudio file formatComputer animationEngineering drawing
Visual systemOptical character recognitionField (computer science)Frame problemResultantSet (mathematics)Software frameworkPolygonOptical character recognitionField (computer science)Visualization (computer graphics)Object (grammar)Computer animation
Field (computer science)Frame problemOptical character recognitionVisual systemPeg solitaireBuildingResultantPolygonService-oriented architectureRight angleCase moddingAsynchronous Transfer ModeMedical imagingComputer animation
Frame problemField (computer science)Optical character recognitionOpen sourceImplementationMathematical modelBuildingSoftware testingRepository (publishing)Bounded variationSource codeCodierung <Programmierung>Machine visionSineComputer-generated imageryIntegrated development environmentTime domainShift operatorField (computer science)Mixed realityImplementationAreaReal numberProduct (business)Mathematical modelShift operatorIntegrated development environmentMachine visionWave packetSoftware testingSoftware frameworkLink (knot theory)PolygonOpen sourceTransformation (genetics)Set (mathematics)Bounded variationMathematical modelMedical imagingService-oriented architectureDifferent (Kate Ryan album)ResultantBackupAsynchronous Transfer ModeDomain nameSystem callBuildingComputer animation
Multiplication signComputer animation
Transcript: English(auto-generated)
Hello everyone, my name is Fikir Bohaba, and I'm here to present Building Footprint Extraction in VEXOR Format using PyTorch Segmentation Model Trainer, QGIS plugin, deep learning tools, and Brazilian Army Geographic Service Building Dataset. I'm an officer from the Brazilian Army Geographic Service, which is
one of the government institutions responsible for mapping the Brazilian territory. There are five subordinate units, the G-Information Centers, each in charge of mapping one region of Brazil. Annually, several officers are assigned to carry out postgraduate research in universities to try to solve production problems.
I have carried out my master's research on automatic building footprint extraction from remote sensing imagery. My research goals were to verify whether it is possible to train deep convolutional neural network capable of extracting building footprint vector polygons from very high resolution imagery compatible with 1 to 25,000 scale.
And also in this research, I tried to develop as much as I could open source solutions to solve the research problem. So I've used the dataset AI Crowd, which is composed mainly of urban areas.
It was used by Jared et al. and Zaw et al., both which were my base articles for my research. In the research, we've also built the Brazilian Army Geographic Service Building Dataset, which is composed by airborne photogrammetric imagery.
We have spatial resolution of 35 cm and 39 cm. The images are from the states of Uganda, Seoul, and South Asia. All images have regiometric resolution of 8 bits and spectral resolution RGB.
We've extracted manually more than 1.5 million building footprints from either urban and rural scenes. The dataset has more than 247,000 images, each with 512 pixels by 512 pixels.
There is no overlap between the training test splits, and the dataset is available online. Here are some examples of the dataset. This dataset has lots of rural scenes and some urban scenes that can be seen in the top right corner.
Here is some examples of the vectors in QGIS, as well as the images already split, and the masks built, so some examples.
In this research, we've used a machine learning server with three NVIDIA Tesla V100, each with 32 GB of visual memory.
The polygons were extracted using QGIS and stored in PostGIS, in Postgres. We've developed a Python package called PyTorch Segmentation Models Trainer, which was used to build the masks and also to train the models.
We've also developed a plugin called Deep Learning Tools to handle the built masks, and this
plugin also can consume some API services that receive images, and outputs the GeoJSON polygons infrared. So PyTorch Segmentation Models Trainer is a framework that was built on top of PyTorch and PyTorch Lightning.
It serves models using FastAPI. The models can be implemented using the Segmentation Models PyTorch, which is an open
source library that has several pre-trained weights and pre-implemented models and architectures. And this framework, the PyTorch Segmentation Models Trainer, uses Hydra configuration files.
These configuration files, they are in YAML format. It uses OmegaConf to perform object instantiation, and for instance, there is this model tag which has the target value, which is the Segmentation Models PyTorch unit object.
We have all the parameters, so PyTorch Segmentation Models Trainer auto-intensiates this model and uses it to train the models that are already implemented in the framework.
By using this, we can have some flexible code. You can use it to train your other models. You can even implement your own models, and it will not be a dependency of PyTorch Segmentation Models Trainer.
We use the base frameworks of PyTorch and PyTorch Lightning. PyTorch Lightning is a very flexible and high-level API built on top of PyTorch. It enables easy implementation of training loops, and it has out-of-the-box multi-GPU implementation.
Segmentation Models PyTorch has nine model architectures, such as UNET, UNET++, FPN, PSPnet, DBL3+, and so on. It has 119 backbones, such as those from ResNet family, the EfficientNet family, and MobileNet.
We also implemented some custom models that can be combined with backbones from PyTorch Segmentation Models.
Segmentation Models PyTorch, sorry. And these models were the FrameField model, the model polymapper, polygonal RNN, and the HRNet OCR. Particularly, the FrameField learning was one of the used models in this research.
It is composed by UNET with a ResNet-101 backbone, and it is coupled with a FrameField model, which learns a complex vector field.
And it has nine losses that each looks at a particular aspect of the building. And this FrameField object is used in the post-processing methods, which is better building edges. We also implemented the mode polymapper, which is a combination of an RNN that infers the polygons' vertexes.
It is combined with an object detection network.
We serve our models using FastAPI, so the service receives an image, and it outputs to the client the GeoJSON of each extracted polygon in the image.
Deep Learning Tools has a model that posts an image to the service, receives the GeoJSON response, and outputs it as a temporary memory layer in QGIS.
We carried out eight experiments using both AI-Crowd and Brazilian Army Geographic Service building datasets. In our experiments, we used the following hyperparameters and evaluation metrics. As our optimizer, we used the weighted atom with a weight decay of 0.001, which is called ultra-regularization.
We also used data augmentations to avoid overfeeding, such as random crop, random flips, and histogram jitter.
We also used one-cycle learning rate scheduler and gradient clipping to avoid vanishing and exploding gradients, especially in the RNN-based methods. We also used stochastic weight averaging in the last 80% epochs to have better conversions.
We also employed mixed precision training, which enabled us to have larger batches in our GPUs. We used HE initialization in convolutional-based branches and KME initialization in RNN-operation-based branches.
As the evaluation metrics, we chose the polygons and line segments, POLIS, which is a polygon similarity metric. When two objects have POLIS 0, it means that they are the same object, and the further these objects are from each other.
For instance, a very different object has a very large POLIS. We also used the mean max tangent error angle errors, MMTAE, and intersectional reunion and omission accesses.
Here are some results in the AI crowd dataset. We can see that, particularly in the proposed method of this research, which was the HRNet OCRW48 frame field, it has better visual results than the polymapper in this dataset.
Some other visual results, and we can see that the polymapper did not handle well large objects.
And in the Brazilian Army Geographic Service building dataset, we had a different result. Our proposed technique was outperformed by the mode polymapper, which produced sharper and better building edges.
That can be seen in this image, but the mode polymapper had lots of omissions. It did not detect several adjoining buildings, as we can see in the bottom right image.
And the conclusion, in the Brazilian Army Geographic Service building dataset, the mode polymapper was the best method. And in the AI crowd dataset, the HRNet OCRW48 frame field was the best method.
After some inquiry in the dissertation, we found out that this HR-based method worked better in densely built-up areas, while the mode polymapper worked better in sparse areas.
The open-source implementation of this research is available in the PyTorch Segmentation Models Trainer, and the dataset is also available online. And this research is available at the link.
All calls are based on free and open-source technologies, as future work will test the chosen methods on different areas from Brazil, and conduct a pilot project using the results of this research in a production environment, to assess whether it can be used in real-world cartographic production.
We also want to research mixing different datasets on training steps, and try some domain shift techniques. We also want to test variation of the frame field in the mode polymapper. In particular, we want to replace the backbones with vision transformers,
and check whether it will produce better results. We also want to test a new technique called deep snakes, and all these tests will hopefully integrate it in PyTorch Segmentation Models Trainer.
Thank you for your time. Are there any questions?