OpenStreetMap Element Vectorisation - A tool for high resolution data insights and its usability in the land-use and land-cover domain
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 351 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/68934 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Production Year | 2022 |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
FOSS4G Firenze 2022308 / 351
1
7
13
22
25
31
33
36
39
41
43
44
46
52
53
55
58
59
60
76
80
93
98
104
108
127
128
133
135
141
142
143
150
151
168
173
176
178
190
196
200
201
202
204
211
219
225
226
236
242
251
258
263
270
284
285
292
00:00
Information retrievalJames Waddell Alexander IIUsabilityImage resolutionTime domainInformation managementMultiplicationExecutive information systemFeedbackComputer-generated imageryIdeal (ethics)Computing platformObject (grammar)Virtual machinePressure volume diagramEinbettung <Mathematik>MetreObservational studyRepresentation (politics)Source codeJava appletTelecommunicationImplementationCommon Language InfrastructureTimestampProcess (computing)PolygonConfiguration spaceSoftware testingBenchmarkRandom numberForestHypothesisPreprocessorRange (statistics)Database normalizationMathematical analysisComputer fontDialectObject-oriented programmingGene clusterData analysisAreaStatistical hypothesis testingDifferential geometryFront and back endsDebuggerCovering spaceProgramming languageComplete metric spaceSampling (statistics)Maxima and minimaCluster analysisProjective planeDifferent (Kate Ryan album)Computing platformResultantIdeal (ethics)Price indexPopulation densityLevel (video gaming)Open setVector spaceDisk read-and-write headObject (grammar)Image resolutionMathematical analysisVirtual machineView (database)MereologyType theoryConstructor (object-oriented programming)Attribute grammarSingle-precision floating-point formatInformationNatural numberCellular automatonWordDirection (geometry)Right angleProcess (computing)Arithmetic progressionVirtualizationDigitizingGeometryLine (geometry)Category of being1 (number)BuildingObservational studyNeuroinformatikMappingServer (computing)Internet forumMedical imagingSource codeOrder of magnitudeInterface (computing)HypothesisMultiplication signCross-correlationAnalytic continuationForestUniform resource locatorPolygonBackupShared memoryTwitterImplementationCrash (computing)Vertex (graph theory)Social classCommon Language InfrastructureFunction (mathematics)Metric systemProcedural programmingDatabaseBenchmarkPoint (geometry)Exception handlingConfiguration spaceRevision controlBranch (computer science)FingerprintSet (mathematics)EstimatorTheory of relativityPower (physics)Group actionBefehlsprozessorCartesian coordinate systemSelectivity (electronic)Graph (mathematics)Software testingEndliche ModelltheorieSystem callDimensional analysisRandomizationWater vaporRange (statistics)Normal (geometry)TimestampState of matterRepresentation (politics)Semantics (computer science)Parallel portComputer animationDiagram
Transcript: English(auto-generated)
00:00
So, hello and welcome everybody, thanks for being here, and I really appreciate it that you rushed so much with your lunch to be here, and I'm gonna try to make the blood come back from the stomach to the head. So open-suite element vectorization is a new tool that we developed, and I want to introduce you to it. So first of all, where does it come from?
00:21
So currently we're running the ideal VGI project at the University of Heidelberg, and it's a co-project with the TU Berlin, and we on our side in Heidelberg are mostly focusing on the Open Street Map side, so we're analyzing Open Street Map data and quality, and we're also using the TU results as quality indicators to feed back to the community.
00:42
And if you're interested in that, I had a talk on that on State of the Map, so you can check that out, but the remote sensing team at TU Berlin is mostly focused on multi-label deep learning, and there's also some things about that in that talk at State of the Map. So okay, yet another tool about Open Street Map analysis, right, we have so much already,
01:03
but of course we have much because we have diverse requirements, right? So we have data creators, we have curators, we have users, we have scientists, and each and every one needs their own type of analysis. We also have very specialized tools currently, so for example here you can see it's always an up-to-date, so that answers this question precisely, or keep right, which is a linter
01:23
that has specific questions that it asks the data and answers to the user. And of course we have diverse platforms, I mean that's the same in every FOSS project or every Open Data project, that when data is open everybody can create their tools and it's great and they do great stuff, but it also means they have different ways
01:40
of accessing the data, different ways of programming languages to access the data. So our goal was to combine and provide to the user what has been divided and conquered, so Open Street Map has somehow been divided and in parts conquered, but let's get that information back. So the main goal is to provide a multifaceted data view, so we want to combine intrinsic
02:04
and semi-intrinsic indicators on the data, and we call this vectorization, but there are other words for it, so some people call it feature construction, I think Jennings Anderson calls it embedding, so there are different ways of saying it. And we want to enable data analysis with this tool, mostly in the direction of quality
02:24
analysis, but I will come to it later that it's actually up to you what you use the tool for. And we want to enable machine learning, so we need quantifiable results somehow, and we want it at the highest resolution, so there are tools that work on Rust, like grid cells or whatever, but we want it on a single awesome object, which is the highest
02:42
possible resolution. And we've drawn many knowledge sources, so more than 15 scientific studies and community projects have been analyzed for this project and used. So one is actually from Peter, a study from 12 years ago, where he looked at the
03:01
representation of natural features, and that is now incorporated in our tool, so we now look at how detailed our objects map, but we also use our own knowledge that we created, so that's a paper from last year, where we looked at how are users shaped by mapping in OpenStreetMap and what attributes do users have when they
03:21
map in OpenStreetMap, so they may be influenced by where they are, where they map from, where they are actually mapping, so the location that they're looking at, and so on and so forth. And we wanted to provide that information for data users. So now let's get to the data aspects, so what are we looking at here, so we have
03:43
more than 32 indicators that our tool creates and provides and calculates for the user to then later analyze, and these indicators look into many different attributes or aspects of the data, so we have the semantic category, for example object tags, so I will not list all the 32, just pick some nice ones,
04:04
but that's an easy one, right, you look at the tags of the object, you look at the geometry of the object, so how big is the object, how long is the line, what's the area of the polygon, what's the detail, so how detailed is the drawing, is it drawn very coarsely or very detailed, and where is it
04:21
actually, in what population density is it, I mean that influences the data, I mean I can't expect many buildings when there is no population, right? We also look at the virtual landscape, or digital landscape, however you want to call it, like what is the mapping process, progress at the location that the object is,
04:42
like is this fully mapped or are we still continuing to map or is there no data at all and we would expect some, we do look at temporal aspects, we look at the mappers, I mean this is very important for me as a community analyst or a user analyst to have this view also in there, Hackley and Zieber called this
05:01
the epistemologies of BGI, so when we use the data we not only have the data but we also have the user that created it and she or he influences very much what we see and we have to recognize that, so is she specialized on what she was doing, like if this is a building, has she had any experience in buildings or is this her first building, is she local to the area, can she
05:22
actually draw a building or doesn't she know anything and she just drew it from a remote sensing image, and of course we have an external tool, as I said we have linters, we have OSM nodes and they may also tell us something about the objects in an area, okay so this is Phosphoryza, we have to talk about implementation and it's mainly focused on a Python package so
05:44
we created a Python package but Python is actually mostly used for data collection so we query different APIs, we call different programming language and we handle all that centralized in a Python package and the main GIS work or the main computing, if not done on the server we're calling with the API, is
06:06
done in our back-end so it's done in a Postgres database or a Postgres database and I mean at this point it's time to say thank you for these great tools, Postgres is just an amazing thing that can replace any GIS tools in my opinion or at least for this specific case and as you can see
06:23
the workflow is mostly linear so we're collecting data and then we're transforming it and then we're calculating things on it so there's not much parallelization currently going on but well at the end you get your output from the tool. So how can I use the tool? As I said it's a Python
06:42
package so of course you can download the source and build it yourself we're using poetry to do that we also provide a docker setup so if you're aware of that then you can just get the pre run or the docker building procedure and it will set up everything for you. We have a CLI for the project
07:03
that was like our main that we're currently main focusing the CLI so the interface in the command line and but we're also looking into the API in the front-end so we have set something up and I will share it with you later but it's currently experimental so please don't crush the
07:20
server. So about configuration I mean we need to tell something before we can use the tool and actually only two things are required so we need a backend setup and we need a location and a timestamp so location timestamp that's required the backend setup is like your server-side backup so
07:41
if you use for example in our front-end then you don't need to of course set up the backend because that's what we provide you but for the configuration you need where do I want to analyze and at what time like in relation to what time do I want to analyze it and but there are many optional things like for example external data sets the tool is resilient towards missing data
08:04
so if you don't have the external data sets we would like to analyze then it will just skip the respective metric but if you have them then it will do that metric as well and while during processing I mean everybody has developed here so you can see that sometimes your tool crashes and you don't want to do it all again so the tool is actually done in a way
08:23
that it will catch up from where you left it last so if something doesn't work so an API crashes and you want to rerun it a day later then it will have still have all the data and will catch up from there so here you see an example configuration file and as you see the most important let me try the
08:40
mouse as you see the area of interest and the times and those are the only ones that you need to provide and everything else is optional except for the backend setup as I said yeah so in future we want to work on this two more so for example we want to do more benchmarking how fast is it
09:02
actually we did some benchmarking but that is quite a while back then at that time it took one hour to process 1,000 elements we have sped up some areas but we also have introduced new metrics that are very slower so we don't know what the current time is and I have to say benchmarking is really hard on this one because the single OSM element has a huge role on
09:23
benchmark or on the time it takes so if the single element has 50 versions and 50 people that edited it I need to look into 50 people and look at all the edits whereas if it is only one version it's quite quick so it's highly influenced and it's quite difficult to say how fast is the tool
09:43
we need to of course test our front end and we have some ideas on new indicators there is actually a branch open I wanted to merge before but I didn't get to it so look out for that and certainly you can currently you can only process all the indicators but we want to make it skippable so if you're not interested in one or the other indicator then we want to give
10:02
you the power to skip them and quality estimation I mean we produced the work for or the tool for quality estimation so we want to provide you with that and there it is currently implemented so when you run the tool you will get a quality estimation of the object so how good is the object or how good does the machine learning think is the object but that's very experimental and
10:25
we're looking into that more okay let's get into action let's see what the tool can do let's burn some CPUs we use the tool for an example application of land use and land cover elements so that's LULC that you see here and of
10:40
course land use and land cover is because that our current project is in that area so it's vital for us but also we think that this type of information is very important for OpenStreetMap because it kind of makes the map looks nice like it's the background of everything and people lose it use it a lot like we have many applications where we use land use and
11:03
land cover data not only in our project but other people use it to train their models to classify our remote sensing data and so on and so forth so we use 1,000 elements in total they were 63 million so it's only a very small set but selecting these elements is actually quite not as straightforward as you would think I mean you can make a random selection
11:24
and that's what we did we choose random elements but as you can see on this on the on the graph it's first of all it will highly influence what data you will get so with this random selection you will get a lot of objects for these texts that are very common like naturally good water is very
11:44
common you will get a lot of lakes in your data set but there's another dimension to randomness which is random location so get me elements at random locations and that will favor larger or elements more so we choose to look at one dimension first but we will look into the other dimension in the future
12:05
okay so first of all I mean simple thing we can do hypothesis testing on the data we now have a multi-dimensional data set and let's test some hypothesis so unfortunately there's a oh no there's an image missing doesn't matter so we have three hypothesis that we claimed that
12:23
cities are mapped first large objects are in less populated areas so we can't frequently find large objects in populated areas and that region get first drawn coarsely but this actually contradicts agent one and
12:41
two so this is kind of a complicated relation and what we actually found is that well cities are not mapped first as at least if you look at how old are objects in cities they are actually in relation quite new so what we actually found is that maybe they're from at first but they're also
13:01
continuously updated so open street map cities are a continuous effort of mapping and therefore we cannot analyze it in this direction of looking at how all the objects are what we actually confirmed is that lot of large objects are in less populated areas but the correlation is weaker than we expected so this is the image you can see actually on the right that it looks
13:24
like an exploded pillow whereas we expected to be much more correlated and the size and age as we predicted it has a complex relation so we cannot have one hypothesis that's true for this one but of course this was on a
13:41
global data set right and we all know that global global analysis some sometimes hide regional analysis so we also looked at regional trends and we could find that object age is actually quite correlated with the region the object lies in so we have very recent objects in Africa and Asia
14:02
whereas Europe and North America have older objects so we can hypothesize either they are outdated or these regions were just mapped first and now maybe met better and we also have to say that I mean 60% of the objects we have in our data set or in this sample in this random sample come from Europe because that's where most objects are currently in the land use
14:24
data set in OpenStreetMap so that also introduces a bias that we have to acknowledge but what we think is more interesting because we have this multi-dimensional data set is clustering yeah so clustering we did a five five clusters with a k-means clustering but we had to do some
14:42
pre-processing beforehand so we had to cut the range because k-means is has problems with extreme values we had to do some normalization and that's important we removed all geographical clues so any information that was geographical we removed it so we could see maybe our clusters that are
15:02
generated from the attributes of objects are actually geo-referenced so we can find geographical clusters without actually knowing where the objects are so that was our interest and I don't want to get into all clusters but cluster 3 really sparked our interest because it's really specific so this is
15:21
one attribute that we computed with our tool which is the coarseness of the object so how far are individual vertices of the object apart and we can see that the objects in cluster 3 are drawn with the most details compared to the other classes you can also see that these
15:41
objects are in quite active regions so how many eyes have actually looked at the region that the object is in and the objects in cluster 3 lay in areas where we have a lot of mappers that looked at the areas or touched the area of the object we also looked at how complete is land cover in this area
16:03
and in opposite to what we would expect if we have a lot of mappers it should also be complete but in fact it's not complete so we have only little completeness in the land coverage in OpenStreetMap in these areas so it's a quite interesting cluster right we have old objects comparably old objects to
16:24
the other clusters and we have many imported objects so the share of imported objects is quite high and this is all computed through our tool right if you run this tool on any object you will get all these information and many many more yeah and then we get into what are these
16:45
objects actually so we have a high share of lakes I know the color isn't chosen very right but the green area is is the lake so we have a high share of lakes in cluster 3 and this was not part of the cluster
17:01
right we removed all geographical information so the cluster algorithm didn't know where the object was but it still agglomerated all objects in cluster 3 and from North America into one so this cluster obviously are some North American lakes that are very detailed drawn but are in areas with
17:21
less OpenStreetMap information and so this can be somehow an arch type of OpenStreetMap data that we may look more into in the future so I think it's a very interesting cluster to look into yeah and I want to finalize with a call actually to join our path and in analyzing OpenStreetMap
17:42
and looking more into the data and using the tool to analyze different objects analyze maybe object you created last or analyze an area you're interested or analyze anything and the idea is really to we provide the features or the tool provides the features and you bring your labels so if
18:02
you have any labels for single awesome objects then use the tool and it will give you a bunch of features and you can try machine learning yeah and you can try the front end it may not be very performant tried out you can see what the data looks like this is a screenshot from how it looks like you will get a map of the object and some attributes and also this analysis that
18:24
we did here is open-source so you can look into it how we did it and what it's like so thank you very much