Semantic assessment and monitoring of crowdsourced geographic information

Cite

FOSS4G

Open Source Geospatial Foundation (OSGeo)

McNair, Hamish

Formal Metadata

Title

Semantic assessment and monitoring of crowdsourced geographic information

Title of Series

FOSS4G Seoul 2015

Number of Parts

183

Author

McNair, Hamish

License

CC Attribution - NonCommercial - ShareAlike 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this

Identifiers

10.5446/32152 (DOI)

Publisher

FOSS4G

Open Source Geospatial Foundation (OSGeo)

Release Date

2015

Language

English

Producer

FOSS4G KOREA

Production Year

2015

Production Place

Seoul, South Korea

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Whilst opensource software allows for the transparent collection of crowdsourced geographic information, in order for this material to be of value it is crucial that it be trusted. A semantic assessment of a feature’s attributes against ontologies representative of features likely to reside in this location provides an indication of how likely it is that the information submitted actually represents what is on the ground. This trust rating can then be incorporated into provenance information to provide users of the dataset an indication of each feature’s likely accuracy. Further to this, querying of provenance information can identify the features with the highest/lowest trust rating at a point in time. This presentation uses crowdsourced data detailing the location of fruit trees as a case study to demonstrate these concepts. Submissions of such crowdsourced information – by way of, say, an OpenLayers frontend – allow for the collection of both coordinate and attribute data. The location data indicates the relevant ontologies – able to be developed in Protégé – that describe the fruit trees likely to be encountered. If the fruit name associate with a submitted feature is not found in this area (e.g. a coconut tree in Alaska) then, by way of this model, the feature is determined to be inaccurate and given a low trust rating. Note that the model does not deem the information wrong or erase it, simply unlikely to be correct and deemed to be of questionable trust. The process continues by comparing submitted attribute data with the information describing the type of fruit tree – such as height – that is contained in the relevant ontologies. After this assessment of how well the submitted feature “fits” with its location the assigned trust rating is added to the feature’s provenance information via a semantic provenance model (akin to the W3C’s OPM). Use of such semantic web technologies then allows for querying to identify lower quality (less trustworthy) features and the reasons for their uncertainty (whether it be an issue with collection – such as not enough attribute data being recorded; time since collection – given degradation of data quality over time, i.e. older features are likely less accurate than newer ones; or because of a major event that could physically alter/remove the actual element, like a storm or earthquake). The tendency for crowdsourced datasets to be continually updated and amended means they are effectively dynamic when compared to more traditional datasets that are generally fixed to a set period/point in time. This requires them to be easily updated; however, it is important that efforts are directed at identifying and strengthening the features which represent the weakest links in the dataset. This is achievable through the use of opensource software and methods detailed in this presentation.