OSM Water-How well are Minnesota’s water features mapped
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Anzahl der Teile | 70 | |
Autor | ||
Lizenz | CC-Namensnennung 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/58491 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
State of the Map US 201947 / 70
5
7
14
20
26
28
30
31
33
36
38
40
43
44
51
53
55
56
60
61
67
68
69
00:00
Computeranimation
06:22
Computeranimation
08:00
Computeranimation
Transkript: Englisch(automatisch erzeugt)
00:01
Thanks for coming to the last day, first presentation. My name is Matt Manley, and I'm a GIS data analyst at Critigen. And I'm Bentley Breithaupt, a GIS developer with Critigen. And today, we want to share with you our project that we've been working on, OSM water, how well are Minnesota's water features mapped.
00:22
So to start, we wanted to provide a little context on why we're focusing on water features specifically. So basically, water features, along with other natural features, provide important geographic context in OSM. They're also an important natural resource, so knowing where they are can be an important factor.
00:41
And they also can act as a barrier or facilitator of human movement. So by that we mean, in OSM, you can imagine water features being pretty closely linked to things like ferry routes, or gears, or bridges. Finally, they represent a pretty interesting case study concerning the role of bulk imports in OSM.
01:01
So, we'll get back to bulk imports in a second and kind of dive into that, but this chart essentially shows OSM feature counts in Minnesota over time. So we're showing highway tags versus waterway tags versus natural equals water tags to show kind of the change in feature creation rate over time.
01:21
And there's two things that are kind of interesting about this. The first is there's a pretty distinct difference between the relative rates of feature creation over time between road features and water features. And the second is kind of how important bulk imports can be to contributing to map completeness. You can see the TIGER import between 2007, 2008,
01:42
and the NHD import, which we'll kind of go into a little bit more, but that's between 2009, 2011. So to kind of get into bulk imports a little bit, the OSM Wiki defines bulk imports to be the process of uploading external data to OSM. And this acts as a supplement to user-created data.
02:03
It usually involves a complex merging process and some pretty popular examples in the U.S. include the TIGER data and big maps. You can see how that data makes its way into OSM. So in kind of doing our research, we realized that bulk imports have been
02:22
a pretty controversial topic in OSM for a long time. This is largely due to the nature of bulk imports. They kind of go against the core model of OSM, which is that individual users manually add verifiable data to the map. There are, however, some pros to using bulk imports.
02:41
Due to the amount of data that you can import, obviously it can increase map completeness. There also can be a validation feedback loop between the external source and the OSM community. And what that means is that if users in OSM make edits to that external data that's been uploaded, there can be a feedback loop and that external source could potentially get those edits and make some improvements to their own data.
03:04
There are, however, some cons that people have pointed out. Some people think that this might discourage users from making additional edits. If they go to a map and see that it primarily looks finished, they may not have the incentive to keep adding to it. It can also be a pretty complicated process. It takes a lot of community input.
03:22
And if the data source is not super accurate and it gets kind of uploaded carelessly, you could introduce a lot of errors. However, despite all this back and forth, there's been a lot of bulk imports to OSM and one of them is the NHD dataset, which is the National Hydrography Dataset managed by the USGS National Geospatial Program.
03:44
And essentially what that represents is the water drainage network of the United States. And it was bulk uploaded to OSM in multiple steps in the United States, in most of the United States. Definitely involved a complex import process to correctly map all these NHD data feature types
04:03
to the corresponding OSM tags and names were brought in when available as well. So our research questions are essentially, we started broadly with what is the state of water data in OSM and that's a pretty broad question. So we narrowed it down a little bit to specifically what role does NHD play in the representation
04:22
of water features? And then since we're in Minnesota, we decided to kind of use a case study in Minnesota and look at how it has bulk imported NHD data changed since import. We broke our study into four pieces that we'll be presenting here. Starting off with just how did we define water
04:41
for this project? Followed by where did we get our data from and what did we do to prepare that? Then looking at how we identified changes in the NHD dataset and classifications of those changes and visualizations to show you. Start off with what are we calling water here?
05:02
We wanted to do something very OSM based so we literally started with the OSM based map. We just looked at it and it was like, what's blue on here? We found out that that is based on Carto style sheets that map tags associated with features to the styling that you then see on the map.
05:20
We went with the water style sheet and selected a number of tags from there to create what we call our inland water definition. This is the definition that we used here. We used four OSM tags, waterway, natural, land use and wetland and a variety of values from each of those.
05:40
We thought that inland water was a good fit for our study in that NHD being a drainage water set very much fit with that and then being in Minnesota, not a lot of oceans or seas or anything of that sort but many, many beautiful lakes and rivers.
06:02
And we thought that this was also a fairly self-explanatory definition in that it is just anything that is traversing over or surrounded by land. Generally, it is fresh water with some little exceptions like tide flats.
06:23
After we had our definition of water, we needed to get NHD data out of the OSM data set. We did this by creating what we're calling time slices. These are snapshots in the historical OSM data set at specific points in time.
06:41
We went with three different time slices, one for 2009 before there were really any water bulk imports, one for 2011 after most of the NHD data had been imported for the first time and then a 2019 data set to just use as our current data. We generated these time slices
07:00
by starting with a full OSM history file in PDF format. We processed all this data with something called Osmium tool. It's an open source, just OSM file manipulation program that works great with history files as opposed to just normal OSM data files.
07:21
We started by doing a soft clip to Minnesota, meaning just anything that intersects or overlaps with the area of Minnesota. We then extracted those three time slices that I was talking about, 2009, 2011, 2019. And then finally, we filtered each of those time slices so that they only included items that had tags from our water definition
07:42
and as well as a few other things just to keep their full geometry, a few extra nodes, that sort of thing. At this stage, we had three PDF files, one for each time slice, each with all of the water data for Minnesota at that time.
08:02
So we can look at how our time slices compare. So this graph shows our three different years, 2009, 2011, and 2019, showing all features versus NHD features for Inland Water. You can see that pretty much throughout, NHD represents a huge portion of the data
08:21
once it's uploaded in 2011, 2019. So there's very little change between 2011 and 2019, which is kind of what we were looking for. We wanted to capture the bulk import between 2009, 2011, and then analyze what's happening between those last two time steps. There's also about 300,000 features that were added, mostly sourced from NHD.
08:41
This second chart is pretty similar except it breaks things down a little by geometry type, which we kind of took into account in this analysis. This chart's using a logarithmic scale, so just keep that in mind. You really wouldn't be able to see much on 2009 if we didn't do that. But again, it's obvious that there's a huge jump in features between 2009, 2011.
09:03
This kind of illustrates the different proportions of tags per time slice. So 2009, you can barely see, but you can see natural equals water is pretty much like the only tag that's showing up there. But then we have a big jump in the proportions and relative sizes in 2011.
09:21
And then again, in 2019, there's not much of a difference. One interesting note that's kind of hard to see on here is that there's actually some declines in tags, notably waterway equals river, waterway equals riverbank, and waterway equals stream. And we think that's probably due to some changing like tagging practices within the community
09:40
or some tags that got imported from NHD that maybe should be called something else, people decide later on. All right, so up until this point, we've been kind of just looking at counts and things like that, but now we're gonna kind of walk through our analysis of geometric change in NHD features from 2011 and 2019.
10:00
And this will tell us how bulk imported data changes over time with the help of the OSM community. So this complicated chart pretty much just shows our general methodology for this analysis. So we started with our two time slices and we compared OSM IDs existing within those two time slices.
10:20
And if we could match those, it existed in both times, we checked whether the geometry changed. If it did not, we classified it as a no change feature. If it did change, we classify it as a change in geometry. Stepping back a few steps, if the OSM ID did not exist in both years, then we ask whether there is another OSM ID
10:45
that essentially replaces it. It's like geometrically intersecting and we call that a replacement. If there's nothing that's replacing it, we check what year that OSM ID exists in. If it only exists in 2011 and not in 2019, then we call it a deletion.
11:01
If it only exists in 2019, we call it an addition. In order to kind of get at all this, we uploaded our PBS to Postgres and used some PostGIS functions to query out the change. Basically, we started by querying for change versus no change.
11:20
And the query on the right illustrates how we found polygonal features that did not change between the two years. You can see it's pretty much doing a join on OSM ID and then using ST equals, a PostGIS function, to check whether the geometries were identical. After we identified change versus no change, we broke the change up into our four unique use cases
11:40
of changes in geometry, replacements, additions and subtractions. All right, so given that methodology, this chart shows our classification of polygonal features and how they changed between 2011 and 2019. You can see that the vast majority of features stayed the same, about 89%.
12:02
But 11% did change and that change was comprised of predominantly changes in geometry. This is the same graph for just linear features. You can see the proportion of change increased a little bit, almost 18%. And again, it's made mostly of changes in geometries,
12:22
but a decent amount of additions as well. We kinda wanted to give you an idea of what this change looks like. So this is a map of Minnesota showing features that were classified as changing or not changing, just polygonal features. And change is shown in dark blue,
12:40
no change is shown in gray. One thing to keep in mind is, while the northern part of the map looks pretty impressive, this is a feature classification. So if these features are giant multi-polygon relations and they're changing just a little bit, they're gonna flash blue on the map. So it's not the most exciting thing going on up there, but it does kind of speak to the proportion of change
13:01
going on within the state. All right, so for the next few slides, we're gonna kind of show a little time series analysis. So the first image we'll show for each set will be in 2011, what the features look like. And then the second will be our 2019 features. And then the third will show the features
13:22
that were classified as change for each use case. So we'll start with changes in geometry. So here are our features in 2011. Here are our features in 2019. And finally, this image shows the features that were classified as having changes in geometry.
13:43
Next, we'll show replacements. This is our 2011 image, 2019 image, and the features that were classified as replacements. So these are essentially features that have new OSMIDs and are geographically very similar to the features that are no longer there.
14:05
So here is deletions. We have our 2011 image, our 2019 image, and there in red is our feature that was deleted. And finally, additions, 2011, 2019,
14:24
and our additions are shown in green on this slide. So in order to kind of look at this in a slightly more interesting way, we wanted to create a bit of a slider approach to looking at change. So we created this website, which you can totally visit,
14:42
and we picked out essentially six use cases that showed different types of change between 2011 and 2019. They're just static image sliders, but we used Juxtapose.js for that. And this video just kind of walks through
15:00
a few of our use cases that we wanted to point out. So if you go to our landing page, you click through, you get to our six use cases, and then you can kind of play around with looking at how change happened between 2011 and 2019. So in this image, you have 2011 on the left and 2019 on the right.
15:22
We also looked at linear features. So this second case here is our linear feature change. And you can see how the linear features were realigned with imagery between 2011 and 2019.
15:43
So to kind of summarize what we found from this analysis, we really did realize how big of a role NHD plays in inland water features in Minnesota, specifically. NHD represents 84% of polygonal data
16:01
and 98% of linear data. Also, about 15% of NHD features experienced some sort of change between 2011 and 2019. So that's that changes in geometry, replacements, additions, and deletions. And then 85% of the remaining NHD features remained unchanged between that time period.
16:24
So with all this, we wanted to kind of like conclude kind of what this means. And in terms of the role that bulk imports can play, we really, this illustrates how they can really contribute to map completeness.
16:41
However, it kind of speaks also to like the mindful, careful imports are a good thing. If NHD data was much less accurate, you could imagine there being a lot more edits, a lot more edits that are needed by the community to really bring that up to the OSM standard. But combining these rich data sets like NHD with an active OSM community
17:00
creates an excellent opportunity to enhance data quality. So if you combine a really already good data set with an OSM community, if there is true change in the landscape, that will be reflected by the OSM community changing things. Also, tracking features over time can be complicated. Due to the nature of the way things get imported and then replaced in some cases or deleted or added,
17:23
you can't just like track one OSM ID over time and expect to just capture everything. So our future research, we kind of wanted to dig more into this kind of bulk import question, can they affect editing rates? And to do that, we'd have to look at some areas
17:40
that did not see like an NHD import but also have kind of a rich data set of water. We'd also like to compare updated NHD data with OSM. There has been like this NHDPlus update that would definitely have some additions to the data set that probably have not been integrated with OSM.
18:00
So if we can kind of compare the updated NHD data with current OSM data, that would kind of be an interesting question to look at. Also, we really didn't look at tags in this analysis. So a feature could have its tags changed, it could have its geometry changed, or it could have both. So it kind of increases the complexity a little bit there.
18:21
And then finally, building an interactive viewer instead of using these static images would be a great way to kind of explore the data more thoroughly. So we'd like to just acknowledge our team. Without their help, we wouldn't be here. And at this time, if you've got any questions,
18:40
we'll leave you with some ways to stay connected with us and feel free to reach out. Thanks. Any questions, comments? Jokes. So you mentioned changes that you think
19:05
there might have been tag updates, but you know, your querying wasn't looking at it. I'm not super familiar with kind of the hierarchy of natural water tags, but do you think, another question might be in the attribute mapping
19:23
of the source NHD data, was everything just natural water or were there other subclasses? I know you showed ditch and some other things, but I guess I'm wondering if part of the cleanup of that import would be to try to improve the classification or NHD had about as detailed
19:42
a classification of those features as you could get. So I think NHD is like hyper detailed. If anything, it's got like, I don't know how many, but it would seem like almost 100 different feature types. And all of those things like get mapped to some set of more than just natural equals water.
20:03
So it is complicated. I think it also changed over time. There's like a version one, version two, version three of like all the tag mapping that happened. And I'm not sure, it's very difficult to tell like what tag mapping was used for like what features, if that makes sense, like it's sometimes not clear
20:20
that like when NHD was imported, like what part of the import was that? What year was it that it actually got imported? If it changed in any way, your timestamp is gonna be different. So it's like impossible to tell. So there might not be full documentation on how the attribute mapping occurred? It was a little complicated. The Wiki is kind of like...
20:40
Well, so then my follow up question is, you said it's hyper detailed. Is it possible that you had to generalize a lot of those features because the tagging conventions just don't, at this point, represent that kind of granularity? Yeah, I think the number of tags that features got mapped to is much fewer than the number of feature types that exist in NHD.
21:02
So it kind of gets put into smaller or larger buckets essentially. Yeah, anybody else? Another question. I've been doing a lot of changing of ditches, putting tunnels, culverts under trails, that sort of thing. Does that all show up in what you're doing or?
21:21
So if the feature's ID stayed the same and its geometry also stayed the same, we would not be showing that as change. However, that's kind of one of those future directions we wanted to go in is exactly that. So if you were to go in and just edit the tagging of it, we would see that in our change analysis of tags. But unless that feature also experienced
21:41
some sort of like geometry change or ID change. In order to put a culvert under a path, you have to probably create two points and then change the type. So it'd probably show up. Yeah, yeah, very possible. And then water crosses water and should be connected, that type of thing? Yeah, we picked that up too. Little boundary changes in the Mississippi River would show up? Yeah, exactly.
22:01
Okay. Yeah. So thinking about the example that was just raised, if maybe a node was added, sorry, not added, moved,
22:22
like I don't think, that doesn't trigger a version update in the way, I don't think, that contains that node. So I'm wondering if there might be a challenge in, if someone was just cleaning up the shape of a feature. But again, you were using some Postgres stuff so that might not be applicable.
22:41
And the API data model, I'm wondering how it might detect that. Yeah, I think, yeah, we didn't, because we weren't using change sets as such, instead we were taking full time slices of the data. So exactly as the data was at those points of time. And then as you said, using Postgres,
23:01
it would have been picked up any geometry change that existed, whether it's on a node or the way itself. Hi.
23:21
So I had a question about whether or not it's possible to like close the loop, looking at the current state of the National Hydrography Dataset. For any type of corrections that might have come in OSM, if those types of corrections could be fed back into that national database, or back in the other direction of, if that has been updated with things that OSM has missed,
23:42
is there any way to keep the, something like a rolling import, or any type of way to have back and forth communication between those datasets? Yeah, I think one thing that we found while we were kind of doing research on NHD itself, is that there is like an online markup tool that they have, where like users can go
24:01
and like report inaccuracies. But as far as I know, there's no like true feedback loop between things that get changed in OSM, versus, like that getting back in NHD. I might be wrong, but it seems like it's kind of outside of the OSM community. But that's a great, like, you know, it's a great resource to have, right? I mean, we're showing just one stay
24:21
and a huge amount of change. And, you know, a lot of that change, as you can see with our sliders, is like important change. So if that's not making its way back to NHD, that's kind of a loss, right? Goes the other way too though, right? If NHD is doing a lot of development on their end, obviously a rolling import would be great, because then you're getting that refreshed data. But you've got to kind of like do that
24:41
while maintaining any changes that exist in the OSM data that aren't in NHD. So it's a complicated process, right? Right, right. Thank you. Just wondering about the determination of replacements.
25:04
Was there like an intersection threshold that you use, or anything like that? Or is it, I didn't notice how many there were, if it was enough, you could manually inspect them or anything like that. So replacements are like a pretty difficult thing to find, we found. Because, yes, it is exactly that. It's, you essentially look at OSM IDs
25:22
that exist in one year, versus OSM IDs that exist in the other year, and you look to see if the OSM ID that no longer exists in 2011 was replaced with an OSM ID in 2019, by checking whether it intersects, but not just touches. Because we wanted to find things
25:41
that weren't just part of the network, but were like exactly like, they had to intersect in some way that wasn't just like a node connecting to another node. So as long as some of the area or part of the line intersects, that was probably a replacement. And that was the extent of it, or did you do any further manual inspection of anything? When it was- We did some manual inspection.
26:02
I will say that it's probably our most mixed use case. There's a lot of weird things going on with the way relations are modeled. If a way has its geometry changed, but also kind of gets replaced in a way, it's like, what do you call that, right?
26:22
Is it a replacement? Is it an addition? Knowing what is showing what is kind of difficult. So we kind of attempted to get at it, but there's definitely room for improvement in that department for sure. Cool, thanks.
26:40
Okay, anybody has any other questions? Thanks for listening. Thank you.