Bestand wählen
Merken

An automated classification and change detection system for rapid update of land-cover maps of South Africa using Landsat data.

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
right so good often fact of sorry about that we just to get some technicalities sorted out right so um good afternoon again somebody talking about that basically automated methods in the production of a land cover maps and my work as a software developer for the Council for Scientific and Industrial Research in some Africa and we we focus on remote sensing research so I'm gonna be talking about the implementation of our automated land cover classification system that we've been doing as and when the form of a research project for the last year all talk about some of the tools that we've been employing and also results that we've been able to achieve to died right
so the land cover information is increasingly required by a broad spectrum of scientific and economic and governmental applications from it is really an essential input to assessing ecosystem status and health and patterns of diversity and developing land management policy
and information from the United States National Land Cover as really being used in the number of applications I mean both public and private sectors ranging from the assistant in placement of cell phone towers right through to tracking of how diseases spread um so a number of land covers have been developed for the United States by US chairs and adding all way back to 1992 and to 2001 with 5 you uh improvement so increments Raphael from 2001 and on an and satellite image data from the Landsat program is really key or 18 but into the production of land cover maps at the fitting to resolution the so land cover reconnaissance to be defined as the observed by physical cover on the earth's surface and typically includes or describes things such as cross full trees etc. and there really 2 primary methods for capturing information on land cover and those being field survey and then also the analysis of remotely sensed imagery and its each pixel of a typical land cover maps and is assigned and at Ankara idea what that should define a particular pixel so I that you find in that particular during georeferenced location for example cultivation forest background etc. the right so in some Africa they're
all roughly more than 20 national axonal say international conventions that to reporter land cover change and and typical applications what country include uh that make use of land covers include assessing environmental impacts of new energy infrastructure agricultural planning and monitoring as well as an unlikely developed advanced fire Information Services and loss to the carbon emissions planning so the
latest land cover higher than dates all the way back to 2004 so Africa so it's quite outdated on regional updates to exist and but creating an requires a significant amount of manual verification labor so we've been working on a publicly funded research project to improve the situation with the primary objective of improving land cover mapping efficiency and by developing a highly automated and scalable system that applies among others as supervised machine learning technologies for rapid Nenkova updates by using the widely available Landsat data and right so
out for typical stages can be defined in the production of of land cover maps and namely the source data preparation and feature extraction classification and accuracy assessment as starting with source data preparation yeah it's all about gathering and preparing data of from which 1 will have bell to derive some kind of a land cover classification and secondly feature extraction draws out the exact measurements and all values that you want to be able to conduct a suitable land cover classification on a per-pixel basis and then classification in defiance either the manual or automated or combination of method you're going to you know employed to assign land cover classes to you observed in the data so this talk really focuses on automated methods which we really focus on and and so here we really had to decide between using supervised versus unsupervised has cation methods and for supervised classifier costs for the for the supervised methods um a has far models constrained constrains the up on the extracted feature data and subsequently as a source that is fit pixel-by-pixel through to the classifier model to produce a land cover classification verdict and finally accuracy is that a curious assist accuracy assessment is conducted to determine how precise you or testified land-cover prediction is the and the typical method share include cross validation manual visual inspection and by comparing outputs the high resolution aerial photography and lost actual field work which would mean going out into the field to verify the results yeah so right so I'll I'll
be discussing each of these steps in greater detail and starting with source data preparation so yeah we gather a stack of data that we're going to use to venture predicts a land cover types of um a primary data comes from remotely sensed satellite imagery and specifically from the Landsat 7 them satellites hand them the satellite images get selected for particular period of interest and also um ensuring that is representative of the years of seasonal from phonological change and secondary of course ancillary data such as digital elevation mean precipitation and temperature also collected a corresponding top particular area of interest and lastly 1 used to get so get hold of the quality source of training labels and here we make extensive use of historically produced a land cover maps right so once we have collected of
that is that we move on to mandatory preprocessing of our satellite that are so you make use of the wild processing chain and provided to us by South Dakota State phi worked on by David Roi team there so in a sense the world war web-enabled land said that processing chain is used to preprocess rule Landsat satellite scenes into image composites that remove cloud and no data by finding the zeros of valid data from pre processed uh quite blood scenes and it also ensures that the pixels are chosen based on the highest in the BIO maximum greenness for composite in period and the well those who takes care of re projecting and also georeferencing input scenes into the convenient and motors signs sort of projections and red um so in a sense we use wealth as described here to produce monthly composites and 4 or 4 out of Atlanta reproductions year and then what is shown in the next few slides or typical monthly composites and ranging from January February and May which which covers roughly the seasons of summer through autumn or fall in so Africa particularly for 8 province since of Africa by the name of Kras and tell so here we start with the January composite and then just moving on 2
February so for very you can still see a lot of cloud coming through um so went double-circuit get completed Godfrey scenes that particular month and then for May and in we have much better cloud-free scenes have to write
them off to have acquired and preprocessed also data we go on to extract appropriate and features from the data to train up the supervised classifier model um before it can conduct
feature extraction we to go about loading up our data and here we make extensive use of the all open-source tool to 1st load up a Monthly Landsat composites um so yeah 12 months then sets an composites promoter signs what'll reference target loaded up from H D A 4 format each composite into persisting of several multispectral bands and will so that a single and multi band due to sensory data layers and them Lustiger transform data is also loaded up from source data layers to you ensure uniform georeferencing across all data so just an interesting nights to this of any in this particular domain for about a year and so before the side no real or extensive knowledge of HTS files and also G itself so that that dude all does a great job of of extracting the complexities and exposing just what you need to know it seems your satellite data and also the today it right so as
mentioned that you already at abstracts and complexities of handling multiple data formats and enables creation of or enabled us to create a generic and feature extraction loading later in our system and this generic feature extraction model module handles the job of learning and extracting the corresponding pixel values from each of us or sensory data there's um the feature extract also draws out the exact features that need to conduct unsupervised training and classification and um the features we use from the source said 0 as depicted so yeah we make use of 3 sets of 8 Landsat spectral band values each of these are selected from a 12 month the composites essentially ranked according to respective maximum medium and minimum and DVI and and then in turn corresponding ancillary features are also loaded up so that being the delivery of Asian aspect and slow as well as mean precipitation and temperature and then we also add let alone right so importantly uh land cover label circuits assigned per pixel obtained from the start the labeled data right so just another interesting nights and so loading all day datasets intermediate into memory is achievable with with Google and they just based at this rate just based upon the hardware availability that you have an you may need to employ a windowing top approach if you don't have enough memory but that's where storage area network the top or it really comes into the 4 if you have enough mainly just let everything up right so them so with the assistance of Google were really able to creates really modular and reusable and adaptable layer all feature extraction where a system that can easily be used to modify the right in new features of interest and from a the composites such as different spectral band combinations and ratios and also get able to quite easily add new research data layers from right
so just to give you an indication of how noisy the input data is that we typically have to face so that that typically can contain a lot of and what effects and what is shown is a typical Monthly Landsat composites and yes them what 1 can also receive the scanline there that was introduced the Landsat 7 satellite so I mean weld really does its best to try to the the scanline but in the end and somewhat affects end up dealing through such as cloud etc. so the point that needs to be made here is that our classifier models need to be able to deal with this level of noise in the system and this unfortunately case that eventually it's this kind of the principle of garbage in garbage out you will have downstream classification so that is basically what is shown in the next scene where you have misclassification in certain parts of you will have seen and right so 1 of the needs and
features of our system is that we employ radiometric change detection methods and tube able to adapt and has defined models with new quite satellite data with no previous historic land cover labels exist so the idea here is to check for areas that have not changed since historic label sets and ends up data models only with new features in these no change areas and here we make extensive use of the IRA mad algorithm or iter to reweighted multivariate alteration detection algorithm by Nielsen itself and which in essence compares to roll Landsat images uh from my user interest so in a sense the chi-square change metric kids produced between the 2 images which we then threshold to indicate the actual areas of change and prior to running Ahmed algorithm was a mosque at CloudOn cloud shadows and from the rule Landsat 7 scenes by making use of the if Moscow algorithm by shows you itself no
and right so that we show you the typical that we can obtain from the Boston Ahmed over them so given to them and set images as input and 1 year from 2008 and the next little sure from 2011 and each of these rules scenes are shown here as false colors just the highlights forested areas where we typically see a lot of change and so right so this is the 2008 and the next is the 2011 saying of and then at and then what is shown next is the topmost
Ahmed output and the dark blobs you should work out as being more stocks and the lots of lots of show the actual chi-square change metric between the 2 images and then finally we show a threshold the changes indicating years of probable change and then and just
lost a show you this imposed over the 2011 image just indicates how cloud is not indicated as change
take a so moving on to the actual training of our supervised classifier and once you've got all muscles source data and extracted all the required features from a to read to train and they're roughly 1 million training samples typically for awkward KwaZulu-Natal province and so just to give you a rough indication of size and because Minitel provinces roughly 10 % office of Africa resolve Africa is roughly the size of the state of Texas so we basically to all those samples up and each of those contained 40 or so features and as shown before and these get fit 1 by 1 to turn up a random forest classifier model and time the random forest models you already efficient and effective when training on large volumes of data and all typically used in this problem context and have been employed by used use well for the United States land cover and so we make use of the weqatal tools in implementation of the random forest and provides an extensive collection of well established machine learning algorithms and with this tool data samples per pixel can either be kept in memory and posterior worker to train by the Java API that provides or you can export samples to see is the impostors to occur and to train vices that come on but so once we have trained up our random forests model we are ready to automatically classify and produce a land cover maps our generic feature extraction module is is once again here and to learn features from composites of satellite image imagery and so data an important indictees that features are extracted for every pixel in our study area so that could be roughly 2 . 5 billion pixels form because then that all problems and all of us gets loaded up into memory using Google and his Boston pixel-by-pixel to or classifier and trained model to generate a predicted land IT and all predicted land cover ideas at the pixel then combined to produce a land cover map at 30 meters per pixel the right
so we then move on to assess the accuracy of automatic to produce land cover and here we make use of confusion matrices and generated uh basically and by taking up predicted net to ground truth data for COP or comparison between them and these are then analyzed to to give us an idea of which classes contributing mostly to Europe so where showing is applied so generated off of 1 of those confusion matrices showing that the contribution to error as can be seen as a some of the dominant classes such as cross and contribute quite significantly to our air
but so online validation tools will also developed and to assist the manual comparison of automatically generated and predictions and versus high resolution imagery and that this really assists Quality assurance to go and you just the final step of a validation check
right so the after we've typically conduct about 1st experiment run we go into the phase of system optimization and so yeah parameter tuning of supervised classification is conducted um and is really 1 of the major advantages of supervised classification and yeah we to be to set up 3 sets of experiments to determine all the small sample size and random forest depth number and also features poetry but so just looking at optimization of sample size so here we really varied ensembles for sample size from 10 to 100 % of our sample space at 10 % Improve intervals and what can be seen as that is roughly 2 . 5 % improvements to be had from 10 to 100 % and there's also a general trade off and in terms of processing time versus sample space that used to train so in the end of the day we choose to just take a 2 % improvement at roughly 60 % of the sample space since that's where things start tapering off into the improvement and we get a relative gain is all in train am the we then move on to optimize forestry depth and number of trees so yeah that she gets a lower number of trees is varied as shown on the x-axis and in the tree depth is also varied as shown by the individual curves and so fixing updates to 18 and as shown by the yellow curve is roughly 2 % to be had to be gained from 10 to 30 trees and then in terms of fixing trees to 30 is roughly 1 % gain to be had from uh 16 23 tree depth and is also once again a trade you in terms of tree complexity and train time so we end up choosing from optimal parameters as soon as receive performance starting to taper off and lastly just looking at the number of features per tree so yeah the wicked classifier training tool polls each tree and random forests forest using a random subset of available features and the number of randomized features the trees fixed before training as shown on the x-axis and the other is roughly about a half a percentage improvement to be had by setting of features cheap what features per tree to 20 um and the other curves show clearly exhibits the overfitting characteristic and pattern-recognition so if we start choosing more than 20 features we starting a decrease in performance
the and not so all of the previously discussed aspects have been integrated into an end system which has gone through several I iterations so we started out using Google and Wicca by system command based approach but this resulted in huge interim datasets by mean just take into consideration it's there's roughly 25 million pixels per band uh no image and that we load up we then moved on to use Google and where confided Java API eyes and 1st applying it on 8 core machines with 4 gigs of memory having to use a windowing type approach and eventually we were able to apply this approach on 2 storage area network hardware and and so no more memory when being was required and it's just let everything up again so our system overall is now rapidly conference configurable to run parallel experiments and we use the able to add and you classify features or ancillary layers and able to continue tuning of as far parameters when we make a significant changes and this really enables a rapid and accurate of learning cycle I guess so
we've been able to achieve success of improvement in producing automated land cover maps and here we show an example of an area of 1 of the 1st land cover maps you produced using this system prior to having them optimize random forest parameters so 1 can console really see misclassifications here all large amounts of them due to Scanlon error Hey so the notice how this improves slightly as go to op at after having optimized of parameters and and there was as Beijing before is roughly a 10 % improvement to be heading producer accuracy on post optimization and so as can be seen at the scanline error improves a lot and also upper class and classification also improves quite a bit and then and lost the notice how the misclassification to scan on improves drastically 1 of our latest production runs and so the previous the images that I showed generated using only for seasonal and composites per year versus those that was generated using 12 months the composites and there's a there's also significant improvements here in terms of our as settlement costs as shown in yellow right
so uh wrapping up into show a comparison of and the previously manually manually produced maps and so that this is an example of an older map that was generated using many manual methods and then just moving on to 1 that was produced using completely automated that's the I guess so
moving on from here and in terms of future work really want to go ahead and produce an updated land cover map for the whole of some Africa um and then repeating that for the use of 2005 8 and 11 using Landsat 5 and 7 data we also want to go and produce land cover for 2014 making use of Landsat data and and we also like to adapt our classifier feature set with synthetic aperture radar uh data that would hopefully help us to characterize vegetation structure bitter and hence also testified and active across better would also like to do an alternative analysis between the different open-source tools that using so we're compressor psychic and yet the wheels but just keep unchastened that OK the thank you very much the if fit the if you have any questions please feel free this I'm 1 of the slides toss the end time you pointed out upgrading to San and 256 cakes you copper on in the same sentence from the tonic sugars that at
from our understanding your point right now there's 2 things there's your grade out that the disk to a San onto the machine running it had 256 gigs around were before it didn't and we react just to storage area network infrastructure so that was created from scratch before we weren't using it at all so it just goes to show that we were we were operating on laptops with 4 gigs of memory and then we went to the storage area of ample memory and resources but but the computing that you're going to still being done on a laptop sir I know that's actually not known a processing node that's also part of the storage area it so OK I'm I'm either thinking it but thank you the that the
Resultante
Bildschirmmaske
Rechter Winkel
Implementierung
Projektive Ebene
Prozessautomation
Softwareentwickler
Biprodukt
Computeranimation
Überlagerung <Mathematik>
Satellitensystem
Wald <Graphentheorie>
Pixel
Konvexe Hülle
Zahlenbereich
Kartesische Koordinaten
Ein-Ausgabe
Binärbaum
Sondierung
Biprodukt
Punktspektrum
Computeranimation
Überlagerung <Mathematik>
Weg <Topologie>
Datenmanagement
Rechter Winkel
Mustersprache
Notepad-Computer
Turm <Mathematik>
URL
Information
Bildgebendes Verfahren
Analysis
Fitnessfunktion
Einfügungsdämpfung
Mathematisierung
Automatische Handlungsplanung
Programmverifikation
Kartesische Koordinaten
Physikalisches System
Computeranimation
Überlagerung <Mathematik>
Objekt <Kategorie>
Virtuelle Maschine
Energiedichte
Arbeit <Physik>
Projektive Ebene
Programmierumgebung
Verkehrsinformation
Resultante
Satellitensystem
Wellenpaket
Gemeinsamer Speicher
Mathematisierung
Klasse <Mathematik>
Schaltnetz
Selbstrepräsentation
Computeranimation
Überlagerung <Mathematik>
Open Source
Informationsmodellierung
Prognoseverfahren
Datentyp
Notepad-Computer
Einflussgröße
Bildgebendes Verfahren
Funktion <Mathematik>
Bildauflösung
Multifunktion
Quellcode
Biprodukt
Kreuzvalidierung
Frequenz
Arithmetisches Mittel
Datenfeld
Flächeninhalt
Digitalisierer
Basisvektor
Satellitensystem
Pixel
Prozess <Physik>
Präprozessor
Extrempunkt
Green-Funktion
sinc-Funktion
Validität
Schlussregel
Ein-Ausgabe
Frequenz
Quick-Sort
Computeranimation
Demoszene <Programmierung>
Rechenschieber
Open Source
Verkettung <Informatik>
Vorzeichen <Mathematik>
Hydrostatischer Antrieb
Projektive Ebene
Streuungsdiagramm
Bildgebendes Verfahren
Satellitensystem
Multiplikation
Informationsmodellierung
Last
Vorzeichen <Mathematik>
Gruppe <Mathematik>
Merkmalsextraktion
Dateiformat
Transformation <Mathematik>
Quellcode
Maßerweiterung
Elektronische Publikation
Komplex <Algebra>
Computeranimation
Satellitensystem
Subtraktion
Punkt
Wellenpaket
Extrempunkt
Schaltnetz
Geräusch
Punktspektrum
Komplex <Algebra>
Computeranimation
Übergang
Überlagerung <Mathematik>
Demoszene <Programmierung>
Informationsmodellierung
Multiplikation
Prozess <Informatik>
Gruppe <Mathematik>
Speicherbereichsnetzwerk
Bildschirmfenster
Notepad-Computer
Indexberechnung
Demo <Programm>
Soundverarbeitung
Fehlermeldung
Multifunktion
Dean-Zahl
Pixel
Hardware
Abstraktionsebene
Merkmalsextraktion
Quellcode
Physikalisches System
Ein-Ausgabe
Bitrate
Modul
Arithmetisches Mittel
Generizität
Menge
Rechter Winkel
Festspeicher
Mereologie
Digitaltechnik
Dateiformat
Streuungsdiagramm
Wärmeleitfähigkeit
Chi-Quadrat-Test
Schwellwertverfahren
Linienelement
Mathematisierung
Datenmodell
Mathematisierung
Schlussregel
Strahlungsgröße
Physikalisches System
Ein-Ausgabe
Computeranimation
Überlagerung <Mathematik>
Demoszene <Programmierung>
Codec
Algorithmus
Menge
Flächeninhalt
Multivariate Analyse
Abschattung
Kantenfärbung
Gleitendes Mittel
Bildgebendes Verfahren
Streuungsdiagramm
Modul
Schwellwertverfahren
Linienelement
Mathematisierung
Mathematisierung
Streuungsdiagramm
Bildgebendes Verfahren
Computeranimation
Funktion <Mathematik>
Satellitensystem
Wellenpaket
Klasse <Mathematik>
Applet
Implementierung
Computeranimation
Überlagerung <Mathematik>
Virtuelle Maschine
Informationsmodellierung
Bildschirmmaske
Stichprobenumfang
Meter
Randomisierung
Spezifisches Volumen
Indexberechnung
A-posteriori-Wahrscheinlichkeit
Bildgebendes Verfahren
Beobachtungsstudie
Matrizenring
Pixel
Wald <Graphentheorie>
Merkmalsextraktion
Paarvergleich
Quellcode
Kontextbezogenes System
Modul
Office-Paket
Generizität
Flächeninhalt
Rechter Winkel
Festspeicher
Fehlermeldung
Aggregatzustand
Wellenpaket
Prozess <Physik>
Kapillardruck
Minimierung
Zahlenbereich
Kartesische Koordinaten
Wald <Graphentheorie>
Term
Komplex <Algebra>
Raum-Zeit
Computeranimation
Überwachtes Lernen
Netzwerktopologie
Zufallszahlen
TUNIS <Programm>
Prognoseverfahren
Gruppe <Mathematik>
Stichprobenumfang
Randomisierung
Kurvenanpassung
Phasenumwandlung
Bildauflösung
Parametersystem
Wald <Graphentheorie>
Konvexe Hülle
sinc-Funktion
Validität
Globale Optimierung
Physikalisches System
Paarvergleich
Mustererkennung
Teilmenge
Menge
Einheit <Mathematik>
Parametersystem
Charakteristisches Polynom
Bit
Minimierung
Applet
Mathematisierung
Klasse <Mathematik>
Iteration
Term
Computeranimation
Überlagerung <Mathematik>
Virtuelle Maschine
Iteration
Gruppe <Mathematik>
Speicherbereichsnetzwerk
Datentyp
Bildschirmfenster
Randomisierung
Bildgebendes Verfahren
Parametersystem
Pixel
Hardware
Wald <Graphentheorie>
Prozessautomation
Physikalisches System
Biprodukt
Flächeninhalt
Rechter Winkel
Festspeicher
Dreiecksfreier Graph
Speicherabzug
Fehlermeldung
Rechenschieber
Mapping <Computergraphik>
Textur-Mapping
Subtraktion
Speicherbereichsnetzwerk
Open Source
Datenstruktur
Term
Computeranimation
Analysis
Überlagerung <Mathematik>
Virtuelle Maschine
Prozess <Physik>
Punkt
Dämpfung
Flächeninhalt
Mini-Disc
Notebook-Computer
Festspeicher
Speicherbereichsnetzwerk
Mereologie
Computerunterstütztes Verfahren
Speicher <Informatik>

Metadaten

Formale Metadaten

Titel An automated classification and change detection system for rapid update of land-cover maps of South Africa using Landsat data.
Serientitel FOSS4G 2014 Portland
Autor McAlister, Bryan
Lizenz CC-Namensnennung 3.0 Deutschland:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
DOI 10.5446/31696
Herausgeber FOSS4G, Open Source Geospatial Foundation (OSGeo)
Erscheinungsjahr 2014
Sprache Englisch
Produzent Foss4G
Open Source Geospatial Foundation (OSGeo)
Produktionsjahr 2014
Produktionsort Portland, Oregon, United States of America

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract Recent land cover maps are essential to spatial planning and assessment by non-/governmental agencies. The current land cover mapping methods employed in South Africa are slow and expensive and the most recent national land cover map dates back to 2000. The CSIR is developing an automated land-cover mapping system for the South African region. This system uses widely available Landsat satellite image time series data, together with supervised machine learning, change detection, and image preprocessing techniques. In this presentation the implementation of this end-to-end system will be addressed. Specifically, we will discuss the use of an open source random forest implementation (Weka), a change detection algorithm (IRMAD), as well as tools used for satellite image preprocessing (Web enabled Landsat data, fmask cloud masking) and on-line validation tools. Furthermore the approach used in optimising automatic land-cover production accuracy for operational use will be discussed.
Schlagwörter Automated landcover mapping
machine learning
change detection
Landsat

Ähnliche Filme

Loading...
Feedback