OpenStreetMap as Input for Governmental Datasets: Italian Military Geographic Institute Case
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 266 | |
Author | ||
License | CC Attribution 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/66509 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSS4G Prizren Kosovo 202394 / 266
10
17
23
44
45
46
47
48
49
50
52
53
80
84
85
91
110
116
129
148
164
167
169
173
174
181
182
183
186
187
199
202
204
206
209
215
241
248
265
00:00
Goodness of fitSet (mathematics)Open setLevel (video gaming)BitContext awarenessComputer animation
00:27
InformationDatabaseVector spaceMIDIDisintegrationForm (programming)Pairwise comparisonTexture mappingScale (map)Mathematical analysisRepresentation (politics)Content (media)Source codeOpen setProcess (computing)Endliche ModelltheorieObject (grammar)BuildingData structureRule of inferenceSocial classAttribute grammarVertex (graph theory)Finitary relationMaxima and minimaLengthCalculationScripting languagePersonal digital assistantVariable (mathematics)Meta elementComputer networkDistribution (mathematics)Local ringDigital signalWater vaporCodeOpen sourceObject (grammar)Open setLevel (video gaming)DatabaseArithmetic meanScaling (geometry)Asynchronous Transfer ModeGeometryTerm (mathematics)Context awarenessDialectBitMobile WebBuildingForm (programming)Order (biology)Right angleProduct (business)SequencePairwise comparisonSet (mathematics)NumberLatent heatBasis <Mathematik>Vector spaceConnectivity (graph theory)Mathematical analysisINTEGRALPrimitive (album)Electronic mailing listInformationMappingRepresentation (politics)Correspondence (mathematics)AreaPoint (geometry)Attribute grammarSocial classSoftwareDifferent (Kate Ryan album)TriangleLimit (category theory)Total S.A.Projective planeData structureMacro (computer science)Validity (statistics)2 (number)Process (computing)LengthTable (information)Repository (publishing)Data modelCellular automatonRelational databaseUtility softwareService (economics)MereologyKey (cryptography)AdditionMessage passingMatching (graph theory)Computer animationTable
10:12
Personal digital assistantObject (grammar)InformationVariable (mathematics)Meta elementDistribution (mathematics)Computer networkLocal ringWater vaporEndliche ModelltheorieDigital signalGamma functionAreaTotal S.A.BuildingSign (mathematics)Complete metric spaceStatistical dispersionEquals signPopulation densityPairwise comparisonCompilation albumStandard deviationFraction (mathematics)DatabaseTime evolutionContinuous functionElement (mathematics)Motion captureTerm (mathematics)Point (geometry)Mathematical analysisCross-correlationVector potentialDisintegrationOpen setoutputData structureCache (computing)Key (cryptography)WordVolumeCASE <Informatik>FreewareObject (grammar)NumberOpen setBuildingArithmetic meanFigurate numberTable (information)Level (video gaming)CASE <Informatik>Mathematical analysisElement (mathematics)DialectDifferent (Kate Ryan album)Total S.A.Set (mathematics)AreaData structurePattern languageVector potentialInformationRight angleMessage passingVariable (mathematics)Complete metric spacePairwise comparisonTerm (mathematics)Cross-correlationProduct (business)Point (geometry)MappingMathematicsPopulation densitySocial classMultiplication signSlide ruleLocal ringLengthMaxima and minimaShared memoryINTEGRALOrder (biology)TwitterSatelliteForceAdditionIdentifiabilityView (database)Fraction (mathematics)Electronic mailing listFile archiverDatabaseInflection pointTrail2 (number)Exception handlingStandard deviationStreaming mediaTransportation theory (mathematics)ExtrapolationComputer animationTable
19:58
Exception handlingInformationComplete metric spaceOpen setDatabaseSocial classTexture mappingDisintegrationKey (cryptography)Fraction (mathematics)Object (grammar)VolumeoutputCASE <Informatik>FreewareSlide ruleComputer animation
Transcript: English(auto-generated)
00:08
Good afternoon everyone indeed I will speak about OpenStreetMap and the use of OpenStreetMap to produce official governmental data sets and this is done on behalf of also my Italian colleagues
00:21
Alessandro Serretta and Maurizio Napolitano that I want to thank for their contribution here. So let me start with some introduction and setting a bit the context we know that for centuries public sector has been the sole responsible for the collection for the production for the validation for the update of geospatial information but in a way starting from the
00:44
beginning of this century things change and in particular citizen-generated data or crowdsourcing or volunteer geographic information projects in a way started challenging the role of the public sector and the most popular example is of course OpenStreetMap. Since then there has
01:01
been a huge body of literature produced on the quality of OpenStreetMap and or the comparison of OpenStreetMap with authoritative data to see whether they were similar whether they could be integrated complemented etc and we have seen also different forms of actual integration happening between OpenStreetMap and authoritative data from the public sector that mean public
01:24
sector bodies that have integrated OpenStreetMap in their production workflows but also the other way around so OpenStreetMap community is integrating public sector data through imports but also more recently from the business sector and I think the most popular example is the Overtour Maps Foundation that at the end of last year more or less
01:45
promised well was born with a promise to produce global up-to-date quality certified data sets also based on OpenStreetMap so with this context I will going to talk about a new data set called the National Summary Database or DBSN in the following that has been recently produced by
02:05
one of the Italian many cartographic bodies that we have this is the Italian Military Geographic Institute and this data set was released in September something about the data set yeah so it's a vector database and it includes geospatial information that is supposed to be
02:23
used at the national level so the scale is a medium one one to twenty five thousand it derives from mostly as we will see later from regional geotopographic databases at a larger scale like one to ten thousand and it will be used to derive maps at smaller scales
02:40
up to one to two hundred and fifty thousand currently we have this new database only for twelve out of the twenty Italian regions and you can see from the maps well not sure you can read but we have two different releases of the database one in September and that was that is for the south of Italy and one in December last year so we have it not for the
03:02
whole Italy but for a part of it why is it important because the IGM declared that among the different data sources used to produce this new data set they also used OpenStreetMap and the way in in addition to the to the specifications the way we can actually trace it in the in the database is through the attribute table there is a specific code that points to the actual source
03:26
and this is a code for each object for each feature in the database so when we have the code zero three this means that object derives from OpenStreetMap so we can actually measure that as I said before this also derives mainly derives from geo topographic regional databases
03:44
also cadastral data IGM data so data that the IGM already had and so on and so forth there are several sources what is interesting is that they released it under ODBL which is the open database license it's the same license of OpenStreetMap data this is just the disclaimer that you have to accept just before downloading the data this is important because the IGM is used
04:05
or was used to charge users for downloading their data this is free and this is open access now with this context what do we want to do here we want to understand a bit more of what is the actual role that OpenStreetMap played in the process of production of this new database
04:23
so how much it was used where it was used why it was used this through a sequence of steps first of all the analysis of the data models of the two data sets then the mapping between the two to understand actually how much they can be compared then we try to assess exactly how
04:42
OSM was used for each italian region and province and finally we zoom into specific classes of objects like buildings roads and railways so let me start by introducing a bit data model of the DBSN first of all this has these specifications have a legal basis because
05:01
they were included in an annex of a ministerial decree from 2011 that was signed by the italian president of the council of ministers so there's a legal base the structure is a hierarchical one we have first of all 10 layers layers are represented in this column you see for example from the top geodetic and photogrammetric information roads mobility
05:23
and transport buildings hydrography etc each layer corresponds to two or more teams and then teams each teams correspond to two or more classes so we have 10 layers 30 teams and 93 classes in this data model now the OpenStreetMap data model should be probably well known to many
05:42
of you we have a geometric component and three possible primitives notes that are used to represent point objects so a couple of coordinates lat long ways that in OpenStreetMap are used for both linear and aerial objects and then relations which are which model the relationships
06:04
between two or more nodes ways and or other relations of course then in order to be semantically something a geometry has tags or attributes associated to it in OpenStreetMap they are called tags these are key value pairs and we have at minimum one tag per object
06:23
but there is no limit in OpenStreetMap to the number to the number of tags that an object can have now first thing we did was a mapping between the two data models in order to match semantically the DBSN layers and teams to OpenStreetMap objects so here you see just some
06:42
examples for the teams corresponding to two layers out of the 10 so in blue you see the teams corresponding to the layer roads mobility and transport in red you see the teams corresponding to the layer buildings and human settlements and of course on the right hand side
07:02
all the list of OSM tags corresponding it's important to say the two data sets have are completely different data sets one is a crowdsource product the other is a governmental product so they were produced for different purposes by different people so clearly there is not a perfect match there are objects in the DBSN that have no correspondence you know as I'm like for
07:23
example the DTM that is modeled through TIN triangulated irregular networks is not available in OpenStreetMap so that which is totally okay when we compare this kind of data sets now what did we do in terms of analysis so two let's say macro steps the first one is the
07:44
assessment of the OSM role in the production of the DBSN and this happened through the download of data the extraction of the objects that use OSM as a source and then the enrichment of those data sets by adding all the attributes in the DBSN and also we
08:01
translated them in English and finally we aggregate and we provide the picture for each province and each region second step is the comparison of the two data sets for specific classes I will zoom into buildings roads and railways by doing some comparisons for the area of buildings and for the length of roads and railways and then for buildings I will show some
08:25
spatial analysis that we did in order to calculate how much the two data sets actually intersect overlap each other let me just mention everything is written in Python is available at this repository under of course an open access and open source license to allow reproducibility
08:44
good so let me start from the contribution of OpenStreetMap this is the picture that shows for each of the 12 Italian regions how much so this is a percentage not sure you can see very well but this is the percentage of the total number of objects for each region
09:03
for each layer and here for each theme that derive from OpenStreetMap so when you see a blank cell this means there is no object that derives from OpenStreetMap when you see 0.0 this is actually a number higher than 0 but lower than 0.1 so this is really
09:22
almost zero but it's not so if we look at this table we can see that out of the 10 layers seven have at least one object derived from OpenStreetMap and in particular four one two three and four have quite a let's say more significant contribution from OpenStreetMap these are roads mobility and transport buildings underground utility services
09:43
are pertinent areas we also have very high numbers and you probably don't see but these are colored in red that these are above 90 here I said before the teams are 30 in total here we just represent those that have at least one object derived from OpenStreetMap
10:01
these are only 14 it means the other 16 have no objects derived from OpenStreetMap so some additional messages here extracted from the table so the contribution of OpenStreetMap is very variable among the 12 regions if we look at the table that was on the left in half of the cases the contribution is lower than one percent so overall let's say in on average it's quite
10:25
limited but we have peaks of more than 90 percent now um well 16 out of the 30 teams do not include any object from OpenStreetMap here you have the list but also the last comment is important the percentage always depends on the total number of objects if I zoom back there is a
10:44
case here for the other transport and for the Molise region here we have 100 percent but this corresponds to only one object so one object 100 percent but this doesn't mean that of course OpenStreetMap is heavily used so we need to always look at the at the data and not just at the numbers if we look at the provinces not at the region so it's the same table as before
11:05
but we have another layer for provinces within each single region we find out not surprisingly that the behavior the pattern is the same for the provinces within each region with some exceptions for example Naples here has quite a let's say significantly higher percentage here
11:23
for the roads and you see is the only province in Campania that has OpenStreetMap objects for other transport so there are some cases where actually we can spot a difference within the same region now zooming on buildings only here we calculated for each province
11:43
of the 12 regions where we have the data set the ratio between the total area of buildings in OpenStreetMap and the total area of buildings in the authoritative data set which is an indirect way to calculate the completeness of OpenStreetMap data what can we see also
12:01
visually that in three regions Sardegna, Puglia and Toscana we have a number that is very close or even higher than 100 percent this means the total area of OpenStreetMap buildings is even higher than the total area of the authoritative data sets but in general the situation is very heterogeneous also looking at the literature that we we we know very well
12:24
there might be several reasons for these heterogeneous pictures so the completeness can depend on the demographic density we know that some areas are areas where more people live so OpenStreetMap is more or is better mapped than some others the attractiveness you know some areas are more popular people just go and visit those areas so they are better mapped also for
12:44
that reason there might be also the presence or the absence of a local active OpenStreetMap community and also imports of course imports you know even without the community if building data sets are imported then in in one second in a way the completeness increases
13:02
now this table in a way complements the figure because here we are extracting the same numbers but for the at the regional level not at the province level these of course confirm that Toscana, Sardegna and Puglia have very high numbers even higher than 100 percent we can also
13:21
see that the minimum is Calabria 35 percent this means really poorly mapped in terms of buildings if we look at the second column this is the standard deviation of the ratio between the two areas for the provinces within each region and here this is interesting because several
13:40
values are quite small but we have two very big values for Campania and Lazio that really means there is a huge in let's say inner variability and if we look back at the table the case of Campania is significant because we have only Naples this is Naples with a very high number a very high percentage but all the other provinces in Campania have very low numbers
14:03
the same somehow is in Lazio with Rome being very well mapped but then a very high variability all around finally we try to do something more we try to overlap the two data sets and we look at the fraction of the area of open street map buildings that
14:22
do not intersect any building in the authoritative data set not surprisingly the values are quite low they are all lower than four percent i think with the exception of three cases but these values are still higher than zero so it means there is something in open street map that
14:41
is not in the authoritative database possible reasons also here we can just speculate on what the reasons could be in favor of open street map the fact that the the the igm so the authoritative public sector body is not capturing the right or all the right or the correct objects or the fact that and this is i think a very common message for all the
15:06
public sector agencies they really cannot keep the pace that open street map and as well as any crowdsourced product evolves with so you know open street changes continuously so they need
15:21
and they probably are not able or partially they may not be able to really keep the pace of this change possible reasons against open street map there are elements that are buildings in open street map but that are not buildings in the dbsn like greenhouses or roofs roofs are modeled as buildings in open street my roofs are just buildings that only have a roof but do not
15:44
have a building structure so these somehow are buildings but they are not included in the dbsn buildings that do not exist anymore like demolished buildings that might still be available and mapped as buildings in open street map but also the fact that of course we never expect a perfect
16:02
overlap between the two data sets so of course there is there should always be something that really we we cannot expect really perfect overlap so different reasons but i think different food for thought here on what the reasons could be finally a very quick slide on the roads and railways we basically did the same thing so but this time instead of the area we look at
16:25
the length of the roads and the length of the railways in the two data sets we find out that in general i think the situation is overall better than buildings this is not surprising especially for roads if you look at the on the left hand side figure because roads are really
16:42
the primary class of object that is mapped in open street map it's easy to map roads from satellite imagery from gps tracks open street map actually was born with with roads anyway we can really by just visually looking at the figure conclude that roads are more complete than railways um but and variability is higher for railways also if we look at railways we really see some gaps
17:04
in the south of italy that was just really a first comparison in terms of these two data sets let me close with some conclusions and some points for additional discussion i think we can say that today given the technological advancements given the availability
17:22
of crowdsource data sets mapping agencies actually need also crowdsource data to close the gaps in the in in information that they have and we have seen that this especially happens for base baseline data like roads and and buildings this is actually where open street
17:40
map is used the most open street map can be the solution to these data gaps but it is not yet ready or not yet ready everywhere we could say buildings but also roads we clearly we cannot extrapolate general conclusions but again roads railways buildings are among the first classes of objects that are mapped in open street map so the situation for other classes that
18:04
could be land use that could be pois would probably be only worse than than this nevertheless again i would like to stress the point that this is a very interesting case in terms of the reuse of the data that the igm does takes data from different sources and re uh derives a new
18:21
product and licenses it as odbl this is interesting i think um from the data sharing um point of view future work um well we can uh of course wait that the last eight italian regions are released and then try to extend or replicate the analysis in order to first of all spot possible geographical trends um identify other osm objects that may have even a higher
18:47
potential for integration in the dbsn we might have missed something by only looking at 12 regions also clearly we can extend the analysis on some additional layers and themes not just buildings roads and railways something that would be interesting is to evaluate the correlation
19:06
between the quality of open street map that would need to be assessed somehow and the use of open street map in the dbsn so is there any correlation between the use of open street map where it happens the most and the quality that open street has in those provinces or regions
19:23
but also the opposite workflow now is possible because i said at the beginning the dbsn is used as a source for imports into open street map and this is something the italian community has started to look at of course with interest because it's a it's a complete and potentially
19:44
very useful source of information this is the paper that is published in the uh isprs archives that were published last week where you can also find all the other papers corresponding to the talks given in this academic track and that is it you can also find
20:00
the slides here thanks a lot