We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Creating a land use/land cover dictionary based on multiple pairs of OSM and reference datasets

00:00

Formal Metadata

Title
Creating a land use/land cover dictionary based on multiple pairs of OSM and reference datasets
Title of Series
Number of Parts
351
Author
Contributors
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production Year2022

Content Metadata

Subject Area
Genre
Abstract
Creating a land use/land cover dictionary based on multiple pairs of OSM and reference datasets 1. Background OpenStreetMap (OSM) can supply useful information to improve land use/land cover (LULC) mapping (Arsanjani, 2013; Schultz, 2017; Zhou, 2019). A dictionary is needed to convert each OSM tag into an LULC class. However, such a dictionary was mostly created subjectively or with only one pair of OSM and reference datasets. As a result, the existing dictionaries may not be applicable to other study areas. This study designed four measures: sample count, average area percentage, sample ratio and average maximum percentage; and used multiple pair of OSM and reference datasets to create a dictionary. 50 pan-European metropolitans were involved for testing and 1409 different OSM tags were found. We further found that: 1) Only a small proportion of OSM tags play a decisive role for LULC mapping. 2) An OSM tag may correspond to multiple different LULC classes, but the issue that which and how different LULC classes correspond to each OSM tag can be determined. Moreover, not only the proposed dictionary is useful for various applications, e.g., producing LULC maps, obtaining training and/or validation samples, assessing the quality of an OSM dataset, but also the approach to creating this dictionary can be applicable to different study areas and/or LULC datasets. 2. Data OSM datasets of the 50 metropolitans were acquired for free from http://download.geofabrik.de/index.html in June 2020. Corresponding reference datasets (called urban atlas or UA) were available from https://land.copernicus.eu/local/urban-atlas/urban-atlas-2012/# in June 2020 freely. 3. Methodology The tenet of our approach is to use multiple pairs of OSM and reference datasets for creating an OSM-LULC dictionary. In each pair of datasets, an OSM tag may correspond to different LULC classes, it is therefore necessary to determine which is the most appropriate LULC class for each OSM tag. we assumed that most OSM tags have been tagged by volunteers correctly (Zhou et al. 2019). Following this assumption, the way to determine the most appropriate LULC class for each OSM tag includes two steps. Firstly, all objects of an OSM tag are intersected with those of different LULC classes, respectively. After that, the LULC class with the maximum intersecting area is viewed as the most appropriate one for this OSM tag. Four attributes and four measures are designed to describe an OSM- LULC dictionary. They are: Tag ID, Tag Name, Class ID and Class Name in terms of attributes; and Sample Count, Average Area Percentage, Sample Ratio and Average Maximum Percentage in terms of measures. They are introduced as follows: 1. Tag ID denotes the ID of an OSM tag, 2. Tag Name denotes the name of an OSM tag. 3. Class ID denotes the ID of an LULC class. 4. Class Name denotes the ID of an LULC class.5. Sample Count (SC) denotes how frequent an OSM tag is appeared in different study areas or datasets. 6. Average Area Percentage (AAP) denotes the average of the area percentages of an OSM tag in multiple different OSM datasets. 7. Sample Ratio (SR) denotes the percentage of study areas or datasets that an OSM tag corresponds to an LULC class. 8. Average Maximum Percentage (AMP) denotes the average of all the maximum percentage in different study areas or datasets. 4. Conclusion and application This study proposed an approach to creating an OSM-LULC dictionary. The tenet of this approach was to involve multiple pairs of OSM and reference datasets for the analysis. First of all, each pair of OSM and reference datasets were intersected and the most appropriate LULC class for each OSM tag was determined. Then, the four measures, i.e., sample count (SC), average area percentage (AAP), sample ratio (SR) and average maximum percentage (AMP), were designed and calculated based on multiple pairs of OSM and reference datasets. More precisely, a total of 50 pairs of OSM and reference datasets in pan-European metropolitans were chosen as study areas for creating an OSM-LULC dictionary. Finally, a number of 1409 different OSM tags were found and they were reclassified into five and 14 different LULC classes, respectively. Moreover, this dictionary was also analyzed with the four proposed measures. Results showed that: Firstly, most OSM tags (> 1,000) were only found in less than five study areas (SC < 5). Moreover, only 37 of the 1409 OSM tags had a percentage of average area (AAP) larger than 0.1%. This indicates that a small proportion of OSM tags can play a decisive role. Secondly, an OSM tag may correspond to multiple different LULC classes within a pair of OSM and reference datasets; The most appropriate LULC class for each OSM tag may also vary among different pairs of datasets. Thus Both the SR and AMP may also vary in different pairs of OSM tag and LULC class. With the proposed dictionary, it is possible to understand the differences of different OSM tags and different pairs of OSM tag and LULC class. This is essential not only for producing LULC maps, but also for picking up training and/or validation data from an OSM dataset and also for detecting incorrect tags in an OSM dataset. Therefore, we concluded that it has benefits for creating an OSM-LULC dictionary based on multiple pairs of OSM and reference datasets.
Keywords
Data dictionaryInformation engineeringObservational studyMathematical analysisMultiplicationGoodness of fitReference dataCovering spaceComputer animation
PlanningTransportation theory (mathematics)InformationTexture mappingResource allocationCovering spaceService (economics)EstimationTelecommunicationBit rateData dictionaryObservational studyCorrespondence (mathematics)ForestIntegrated development environmentAngular resolutionTemporal logicShape (magazine)PolygonInflection pointBest, worst and average caseMaxima and minimaSample (statistics)Total S.A.CalculationBuildingArtificial neural networkMathematical analysisCountingRankingComplete metric spaceMetropolitan area networkConsistencyObject (grammar)Database transactionPerformance appraisalPairwise comparisonObservational studyMultiplicationMaxima and minimaObject (grammar)Sample (statistics)Best, worst and average caseData dictionaryMappingCorrespondence (mathematics)InformationMereologyCartesian coordinate systemInternet service providerTexture mappingConsistencyWave packetForestResultantMeasurementCalculationThresholding (image processing)Different (Kate Ryan album)Limit (category theory)SpacetimeReference dataSystem of linear equationsCountingSoftware testingCovering spaceMathematical analysisSet (mathematics)Latent heatMetropolitan area networkCASE <Informatik>Open setComputer animationTable
Transcript: English(auto-generated)
Good afternoon, I'm Yao Ming and I will present your work on creating a land-use land-cover dictionary based on multiple pairs of OSM and reference datasets. Land-use land-cover map has many significant applications. OpenStreetMap can provide useful information for land-use land-cover mapping.
The classifications for OSM and land-use land-cover are different, so a correspondence between them is needed. In recent studies, correspondences are mostly established in two methods, subjectively or automatically based on one pair of OSM and reference data.
Both methods may lead to a problem, that is, the correspondence based on one study area may not always be applicable to others. So, we proposed an approach to use multiple pairs of OSM and reference data for creating a dictionary. Our study area includes 50 pan-European metropolitans, and the reference data is urban atlas.
For each study area, all the OSM tags are intersected with different land-use land-cover classes, and the class with the maximum intersecting area is viewed as the most appropriate class for this OSM tag. Besides, four measures are designed to describe the dictionary.
The first one is sample count, which denotes how frequent an OSM tag is appeared in different study areas. The second one is average area percentage, which denotes the average of the area percentages of an OSM tag in multiple OSM datasets. The third one is sample ratio, which denotes the percentage of study areas that an OSM tag corresponds to a specific reference class.
The fourth one is average maximum percentage. Maximum percentage denotes the ratio of the maximum intersecting area for a pair of OSM tag and reference class to the total area for this OSM tag.
We calculate the four measures for each pair of OSM and land-use land-cover class. Then, all the calculation results are counted to establish the dictionary. This is the dictionary we established. Only a part is shown due to space limitations. In the analysis, we found that more than 1000 OSM tags have a low sample count, and only 37 OSM tags had a relatively large average area percentage.
These two findings mean that only a small proportion of OSM tags can play a decisive role in the dictionary. And how can we use the dictionary? There are three possible applications.
The first one is increasing land-use land-cover mapping accuracy. For example, in scenario 1, we use all the OSM tags in dictionary for mapping. And in scenario 2, we set screening threshold for three measures, and then accuracy was increased for all the test areas.
The second application is picking up training data. For example, OSM tag forest has high sample ratio and average maximum percentage with land-use land-cover class forests, which indicates a high consistency. In this case, the OSM objects tagged forest could be used for picking up forest training data.
The third application is OSM quality assessment. For example, a low average maximum percentage indicates a low consistency between OSM tag and reference class. And we can detect potential incorrect tags in this OSM tag. In conclusion, this study proposed an approach to establish a dictionary based on multiple pairs of OSM and reference datasets.
And it is beneficial to create such a dictionary for essential applications. Thanks for your attention.