OSM Water-How well are Minnesota’s water features mapped - TIB AV-Portal

OSM Water-How well are Minnesota’s water features mapped

00:00

1

Zugehöriges Material

OpenStreetMap US

Manley, Matthew Breithaupt, Bentley Sweeney, Alex

Formale Metadaten

Titel

OSM Water-How well are Minnesota’s water features mapped

Serientitel

State of the Map US 2019

Anzahl der Teile

70

Autor

Manley, Matthew

Breithaupt, Bentley

Lizenz

CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.

Identifikatoren

10.5446/58491 (DOI)

Herausgeber

OpenStreetMap US

Erscheinungsjahr

Sprache

Inhaltliche Metadaten

Fachgebiet

Information und Dokumentation

Genre

Abstract

Water-related features are an often-overlooked part of the OpenStreetMap model. These features help create the natural context for the map and add important reference information. In Minnesota, a large proportion of these features are derived from bulk imports of data from the United States Geological Survey (USGS) National Hydrography Dataset (NHD). Our project seeks to understand the degree to which OSM water data in Minnesota has “moved beyond” NHD through edits or the addition of new features derived from additional sources. This project will identify differences between OSM water features and NHD water features in the hopes of identifying how data evolves from the bulk import phase to the current state of the map. Our presentation will include a discussion of the methodology we used for data preparation and analysis, as well as our proposal for improving the quality of OSM water data in Minnesota in the future. This research will be relevant for those interested in identifying gaps in OSM data, mapping natural features, and mapping in Minnesota more generally.

State of the Map US 201947 / 70

1

32:31

“Keepin' it fresh (and good)!” Continuous Ingestion of OSM Data at Facebook

2

39:27

A deep dive on training data requirements for mapping with deep learning

3

28:07

A Tour of OSM Data Analyses + Visualizations

4

26:36

A turbo introduction to Overpass

5

28:22

6

24:55

AI for disaster response mapping lessons from SpaceNet 4

7

22:15

10 Years of Map Kibera

8

17:46

Bringing Validation to Users Integrating Quality Assurance Checks into Map Editors

9

05:08

Buildings! Buildings! More Buildings, This time in Canada!

10

19:11

Capturing the flow of tags - Chameleon

11

29:15

Are you ready? Bridging the resiliency gap with OpenStreetMaps

12

06:18

Corporate Editors in the Evolving Landscape of OpenStreetMap

13

29:56

Denver Metro Building Import - The Slow Road

14

21:23

Denver Metro Buildings Import

15

21:12

Evaluating fire evacuation routes using open source maps and data

16

20:18

Expanding the reach of OpenStreetMap

17

26:44

Collaboration is Key Engaging communities and governments with Open Cities Africa

18

18:10

Comparing Bing US Buildings with OpenStreetMap

19

04:43

Confessions of a Nervous Geography Student

20

24:31

Gamifying POI Data Collection

21

27:03

Geochicas and The Streets of Women

22

05:16

Get Your Colleagues Mapping How to run a corporate mapathon

23

28:32

How Map Matching Failures can be used for Map Making

24

29:59

How OpenStreetMap can support the UN Sustainable Development Goals (SDGs)

25

15:35

How to get College Students engaged with Mapping

26

31:06

27

19:15

Indoor Wayfinding with Visual Impairments OSM, BLE Beacons, and Nearby Explorer

28

26:16

Ingrid Burrington

29

16:13

Keeping New York road map perfect

30

21:36

Kerb your enthusiasm

31

17:22

Let’s Get Work Zones on the Map

32

18:24

Machine Learning Model Democratization with OSM Data

33

20:22

Machines and Mappers

34

06:41

Making sure your work is understood by others

35

19:56

Mapping Brands with the Name Suggestion Index

36

1:01:58

Mapping Prejudice

37

05:25

Mapping Rural Communities with the Canadian Red Cross Missing Maps Pilot

38

28:25

39

20:03

Measuring National Low Stress Bicycle Accessibility with OpenStreetMap

40

36:18

ODbL license compatibility

41

04:42

Open Street Map Across Grades & Subjects

42

35:08

OpenStreetMap for Location Data Privacy

43

58:51

OpenStreetMap US Plenary

44

23:49

OSM + Wikidata + Metadata +...

45

28:18

OSM Express - a spatial file format for the planet

46

28:19

OSM powerups: upgrade your mapping toolbox from collection to analysis

47

26:50

OSM Water-How well are Minnesota’s water features mapped

48

17:05

Fixing OpenStreetMap with Government Imagery

49

25:26

From The Inside Out Building a City Vacancy Portal

50

33:08

Simplifying OpenStreetMap data conflation with Hootenanny

51

05:10

Source=Mapillary

52

20:20

Spreading the word about the importance of using OpenStreetMapping in the classroom

53

16:59

State of the US road updates

54

30:42

Supporting Local Groups on osm.org

55

21:24

TeachOSM: The New Site

56

05:51

Teams! Teams! Teams!

57

20:24

The approach to building better bike maps in Canada

58

18:54

The Geometry of Mobility Curb Management and OSM

59

20:10

The Power of Satellite Imagery in Disaster Response

60

27:45

The serverless map stack lives

61

06:33

Tracking Down SEO Contributers

62

19:22

Getting on board: Transport fleets capturing Mapillary imagery and our initial journey into OSM

63

25:45

Travel Mapping and METAL Maps and Map Data for Fun and Computer Science Education - Jim Teresco

64

26:06

Update on Open Historical Map: Using the OSM stack live with vector tiles

65

14:03

Using OpenStreetMap for Affordable Housing

66

20:46

Visualize large data sets with Elasticsearch and Kibana

67

07:59

You don't know OpenStreetMap

68

20:40

Road conflation

69

05:28

Road Work Ahead

70

17:31

Route Planning for Blind Pedestrians using OpenStreetMap

Automatisches Abspielen

Sprache

Text

Bild

00:00

Computeranimation

06:22

Computeranimation

08:00

Computeranimation

Transkript: Englisch(automatisch erzeugt)

00:01

Thanks for coming to the last day, first presentation. My name is Matt Manley, and I'm a GIS data analyst at Critigen. And I'm Bentley Breithaupt, a GIS developer with Critigen. And today, we want to share with you our project that we've been working on, OSM water, how well are Minnesota's water features mapped.

00:22

So to start, we wanted to provide a little context on why we're focusing on water features specifically. So basically, water features, along with other natural features, provide important geographic context in OSM. They're also an important natural resource, so knowing where they are can be an important factor.

00:41

And they also can act as a barrier or facilitator of human movement. So by that we mean, in OSM, you can imagine water features being pretty closely linked to things like ferry routes, or gears, or bridges. Finally, they represent a pretty interesting case study concerning the role of bulk imports in OSM.

01:01

So, we'll get back to bulk imports in a second and kind of dive into that, but this chart essentially shows OSM feature counts in Minnesota over time. So we're showing highway tags versus waterway tags versus natural equals water tags to show kind of the change in feature creation rate over time.

01:21

And there's two things that are kind of interesting about this. The first is there's a pretty distinct difference between the relative rates of feature creation over time between road features and water features. And the second is kind of how important bulk imports can be to contributing to map completeness. You can see the TIGER import between 2007, 2008,

01:42

and the NHD import, which we'll kind of go into a little bit more, but that's between 2009, 2011. So to kind of get into bulk imports a little bit, the OSM Wiki defines bulk imports to be the process of uploading external data to OSM. And this acts as a supplement to user-created data.

02:03

It usually involves a complex merging process and some pretty popular examples in the U.S. include the TIGER data and big maps. You can see how that data makes its way into OSM. So in kind of doing our research, we realized that bulk imports have been

02:22

a pretty controversial topic in OSM for a long time. This is largely due to the nature of bulk imports. They kind of go against the core model of OSM, which is that individual users manually add verifiable data to the map. There are, however, some pros to using bulk imports.

02:41

Due to the amount of data that you can import, obviously it can increase map completeness. There also can be a validation feedback loop between the external source and the OSM community. And what that means is that if users in OSM make edits to that external data that's been uploaded, there can be a feedback loop and that external source could potentially get those edits and make some improvements to their own data.

03:04

There are, however, some cons that people have pointed out. Some people think that this might discourage users from making additional edits. If they go to a map and see that it primarily looks finished, they may not have the incentive to keep adding to it. It can also be a pretty complicated process. It takes a lot of community input.

03:22

And if the data source is not super accurate and it gets kind of uploaded carelessly, you could introduce a lot of errors. However, despite all this back and forth, there's been a lot of bulk imports to OSM and one of them is the NHD dataset, which is the National Hydrography Dataset managed by the USGS National Geospatial Program.

03:44

And essentially what that represents is the water drainage network of the United States. And it was bulk uploaded to OSM in multiple steps in the United States, in most of the United States. Definitely involved a complex import process to correctly map all these NHD data feature types

04:03

to the corresponding OSM tags and names were brought in when available as well. So our research questions are essentially, we started broadly with what is the state of water data in OSM and that's a pretty broad question. So we narrowed it down a little bit to specifically what role does NHD play in the representation

04:22

of water features? And then since we're in Minnesota, we decided to kind of use a case study in Minnesota and look at how it has bulk imported NHD data changed since import. We broke our study into four pieces that we'll be presenting here. Starting off with just how did we define water

04:41

for this project? Followed by where did we get our data from and what did we do to prepare that? Then looking at how we identified changes in the NHD dataset and classifications of those changes and visualizations to show you. Start off with what are we calling water here?

05:02

We wanted to do something very OSM based so we literally started with the OSM based map. We just looked at it and it was like, what's blue on here? We found out that that is based on Carto style sheets that map tags associated with features to the styling that you then see on the map.

05:20

We went with the water style sheet and selected a number of tags from there to create what we call our inland water definition. This is the definition that we used here. We used four OSM tags, waterway, natural, land use and wetland and a variety of values from each of those.

05:40

We thought that inland water was a good fit for our study in that NHD being a drainage water set very much fit with that and then being in Minnesota, not a lot of oceans or seas or anything of that sort but many, many beautiful lakes and rivers.

06:02

And we thought that this was also a fairly self-explanatory definition in that it is just anything that is traversing over or surrounded by land. Generally, it is fresh water with some little exceptions like tide flats.

06:23

After we had our definition of water, we needed to get NHD data out of the OSM data set. We did this by creating what we're calling time slices. These are snapshots in the historical OSM data set at specific points in time.

06:41

We went with three different time slices, one for 2009 before there were really any water bulk imports, one for 2011 after most of the NHD data had been imported for the first time and then a 2019 data set to just use as our current data. We generated these time slices

07:00

by starting with a full OSM history file in PDF format. We processed all this data with something called Osmium tool. It's an open source, just OSM file manipulation program that works great with history files as opposed to just normal OSM data files.

07:21

We started by doing a soft clip to Minnesota, meaning just anything that intersects or overlaps with the area of Minnesota. We then extracted those three time slices that I was talking about, 2009, 2011, 2019. And then finally, we filtered each of those time slices so that they only included items that had tags from our water definition

07:42

and as well as a few other things just to keep their full geometry, a few extra nodes, that sort of thing. At this stage, we had three PDF files, one for each time slice, each with all of the water data for Minnesota at that time.

08:02

So we can look at how our time slices compare. So this graph shows our three different years, 2009, 2011, and 2019, showing all features versus NHD features for Inland Water. You can see that pretty much throughout, NHD represents a huge portion of the data

08:21

once it's uploaded in 2011, 2019. So there's very little change between 2011 and 2019, which is kind of what we were looking for. We wanted to capture the bulk import between 2009, 2011, and then analyze what's happening between those last two time steps. There's also about 300,000 features that were added, mostly sourced from NHD.

08:41

This second chart is pretty similar except it breaks things down a little by geometry type, which we kind of took into account in this analysis. This chart's using a logarithmic scale, so just keep that in mind. You really wouldn't be able to see much on 2009 if we didn't do that. But again, it's obvious that there's a huge jump in features between 2009, 2011.

09:03

This kind of illustrates the different proportions of tags per time slice. So 2009, you can barely see, but you can see natural equals water is pretty much like the only tag that's showing up there. But then we have a big jump in the proportions and relative sizes in 2011.

09:21

And then again, in 2019, there's not much of a difference. One interesting note that's kind of hard to see on here is that there's actually some declines in tags, notably waterway equals river, waterway equals riverbank, and waterway equals stream. And we think that's probably due to some changing like tagging practices within the community

09:40

or some tags that got imported from NHD that maybe should be called something else, people decide later on. All right, so up until this point, we've been kind of just looking at counts and things like that, but now we're gonna kind of walk through our analysis of geometric change in NHD features from 2011 and 2019.

10:00

And this will tell us how bulk imported data changes over time with the help of the OSM community. So this complicated chart pretty much just shows our general methodology for this analysis. So we started with our two time slices and we compared OSM IDs existing within those two time slices.

10:20

And if we could match those, it existed in both times, we checked whether the geometry changed. If it did not, we classified it as a no change feature. If it did change, we classify it as a change in geometry. Stepping back a few steps, if the OSM ID did not exist in both years, then we ask whether there is another OSM ID

10:45

that essentially replaces it. It's like geometrically intersecting and we call that a replacement. If there's nothing that's replacing it, we check what year that OSM ID exists in. If it only exists in 2011 and not in 2019, then we call it a deletion.

11:01

If it only exists in 2019, we call it an addition. In order to kind of get at all this, we uploaded our PBS to Postgres and used some PostGIS functions to query out the change. Basically, we started by querying for change versus no change.

11:20

And the query on the right illustrates how we found polygonal features that did not change between the two years. You can see it's pretty much doing a join on OSM ID and then using ST equals, a PostGIS function, to check whether the geometries were identical. After we identified change versus no change, we broke the change up into our four unique use cases

11:40

of changes in geometry, replacements, additions and subtractions. All right, so given that methodology, this chart shows our classification of polygonal features and how they changed between 2011 and 2019. You can see that the vast majority of features stayed the same, about 89%.

12:02

But 11% did change and that change was comprised of predominantly changes in geometry. This is the same graph for just linear features. You can see the proportion of change increased a little bit, almost 18%. And again, it's made mostly of changes in geometries,

12:22

but a decent amount of additions as well. We kinda wanted to give you an idea of what this change looks like. So this is a map of Minnesota showing features that were classified as changing or not changing, just polygonal features. And change is shown in dark blue,

12:40

no change is shown in gray. One thing to keep in mind is, while the northern part of the map looks pretty impressive, this is a feature classification. So if these features are giant multi-polygon relations and they're changing just a little bit, they're gonna flash blue on the map. So it's not the most exciting thing going on up there, but it does kind of speak to the proportion of change

13:01

going on within the state. All right, so for the next few slides, we're gonna kind of show a little time series analysis. So the first image we'll show for each set will be in 2011, what the features look like. And then the second will be our 2019 features. And then the third will show the features

13:22

that were classified as change for each use case. So we'll start with changes in geometry. So here are our features in 2011. Here are our features in 2019. And finally, this image shows the features that were classified as having changes in geometry.

13:43

Next, we'll show replacements. This is our 2011 image, 2019 image, and the features that were classified as replacements. So these are essentially features that have new OSMIDs and are geographically very similar to the features that are no longer there.

14:05

So here is deletions. We have our 2011 image, our 2019 image, and there in red is our feature that was deleted. And finally, additions, 2011, 2019,

14:24

and our additions are shown in green on this slide. So in order to kind of look at this in a slightly more interesting way, we wanted to create a bit of a slider approach to looking at change. So we created this website, which you can totally visit,

14:42

and we picked out essentially six use cases that showed different types of change between 2011 and 2019. They're just static image sliders, but we used Juxtapose.js for that. And this video just kind of walks through

15:00

a few of our use cases that we wanted to point out. So if you go to our landing page, you click through, you get to our six use cases, and then you can kind of play around with looking at how change happened between 2011 and 2019. So in this image, you have 2011 on the left and 2019 on the right.

15:22

We also looked at linear features. So this second case here is our linear feature change. And you can see how the linear features were realigned with imagery between 2011 and 2019.

15:43

So to kind of summarize what we found from this analysis, we really did realize how big of a role NHD plays in inland water features in Minnesota, specifically. NHD represents 84% of polygonal data

16:01

and 98% of linear data. Also, about 15% of NHD features experienced some sort of change between 2011 and 2019. So that's that changes in geometry, replacements, additions, and deletions. And then 85% of the remaining NHD features remained unchanged between that time period.

16:24

So with all this, we wanted to kind of like conclude kind of what this means. And in terms of the role that bulk imports can play, we really, this illustrates how they can really contribute to map completeness.

16:41

However, it kind of speaks also to like the mindful, careful imports are a good thing. If NHD data was much less accurate, you could imagine there being a lot more edits, a lot more edits that are needed by the community to really bring that up to the OSM standard. But combining these rich data sets like NHD with an active OSM community

17:00

creates an excellent opportunity to enhance data quality. So if you combine a really already good data set with an OSM community, if there is true change in the landscape, that will be reflected by the OSM community changing things. Also, tracking features over time can be complicated. Due to the nature of the way things get imported and then replaced in some cases or deleted or added,

17:23

you can't just like track one OSM ID over time and expect to just capture everything. So our future research, we kind of wanted to dig more into this kind of bulk import question, can they affect editing rates? And to do that, we'd have to look at some areas

17:40

that did not see like an NHD import but also have kind of a rich data set of water. We'd also like to compare updated NHD data with OSM. There has been like this NHDPlus update that would definitely have some additions to the data set that probably have not been integrated with OSM.

18:00

So if we can kind of compare the updated NHD data with current OSM data, that would kind of be an interesting question to look at. Also, we really didn't look at tags in this analysis. So a feature could have its tags changed, it could have its geometry changed, or it could have both. So it kind of increases the complexity a little bit there.

18:21

And then finally, building an interactive viewer instead of using these static images would be a great way to kind of explore the data more thoroughly. So we'd like to just acknowledge our team. Without their help, we wouldn't be here. And at this time, if you've got any questions,

18:40

we'll leave you with some ways to stay connected with us and feel free to reach out. Thanks. Any questions, comments? Jokes. So you mentioned changes that you think

19:05

there might have been tag updates, but you know, your querying wasn't looking at it. I'm not super familiar with kind of the hierarchy of natural water tags, but do you think, another question might be in the attribute mapping

19:23

of the source NHD data, was everything just natural water or were there other subclasses? I know you showed ditch and some other things, but I guess I'm wondering if part of the cleanup of that import would be to try to improve the classification or NHD had about as detailed

19:42

a classification of those features as you could get. So I think NHD is like hyper detailed. If anything, it's got like, I don't know how many, but it would seem like almost 100 different feature types. And all of those things like get mapped to some set of more than just natural equals water.

20:03

So it is complicated. I think it also changed over time. There's like a version one, version two, version three of like all the tag mapping that happened. And I'm not sure, it's very difficult to tell like what tag mapping was used for like what features, if that makes sense, like it's sometimes not clear

20:20

that like when NHD was imported, like what part of the import was that? What year was it that it actually got imported? If it changed in any way, your timestamp is gonna be different. So it's like impossible to tell. So there might not be full documentation on how the attribute mapping occurred? It was a little complicated. The Wiki is kind of like...

20:40

Well, so then my follow up question is, you said it's hyper detailed. Is it possible that you had to generalize a lot of those features because the tagging conventions just don't, at this point, represent that kind of granularity? Yeah, I think the number of tags that features got mapped to is much fewer than the number of feature types that exist in NHD.

21:02

So it kind of gets put into smaller or larger buckets essentially. Yeah, anybody else? Another question. I've been doing a lot of changing of ditches, putting tunnels, culverts under trails, that sort of thing. Does that all show up in what you're doing or?

21:21

So if the feature's ID stayed the same and its geometry also stayed the same, we would not be showing that as change. However, that's kind of one of those future directions we wanted to go in is exactly that. So if you were to go in and just edit the tagging of it, we would see that in our change analysis of tags. But unless that feature also experienced

21:41

some sort of like geometry change or ID change. In order to put a culvert under a path, you have to probably create two points and then change the type. So it'd probably show up. Yeah, yeah, very possible. And then water crosses water and should be connected, that type of thing? Yeah, we picked that up too. Little boundary changes in the Mississippi River would show up? Yeah, exactly.

22:01

Okay. Yeah. So thinking about the example that was just raised, if maybe a node was added, sorry, not added, moved,

22:22

like I don't think, that doesn't trigger a version update in the way, I don't think, that contains that node. So I'm wondering if there might be a challenge in, if someone was just cleaning up the shape of a feature. But again, you were using some Postgres stuff so that might not be applicable.

22:41

And the API data model, I'm wondering how it might detect that. Yeah, I think, yeah, we didn't, because we weren't using change sets as such, instead we were taking full time slices of the data. So exactly as the data was at those points of time. And then as you said, using Postgres,

23:01

it would have been picked up any geometry change that existed, whether it's on a node or the way itself. Hi.

23:21

So I had a question about whether or not it's possible to like close the loop, looking at the current state of the National Hydrography Dataset. For any type of corrections that might have come in OSM, if those types of corrections could be fed back into that national database, or back in the other direction of, if that has been updated with things that OSM has missed,

23:42

is there any way to keep the, something like a rolling import, or any type of way to have back and forth communication between those datasets? Yeah, I think one thing that we found while we were kind of doing research on NHD itself, is that there is like an online markup tool that they have, where like users can go

24:01

and like report inaccuracies. But as far as I know, there's no like true feedback loop between things that get changed in OSM, versus, like that getting back in NHD. I might be wrong, but it seems like it's kind of outside of the OSM community. But that's a great, like, you know, it's a great resource to have, right? I mean, we're showing just one stay

24:21

and a huge amount of change. And, you know, a lot of that change, as you can see with our sliders, is like important change. So if that's not making its way back to NHD, that's kind of a loss, right? Goes the other way too though, right? If NHD is doing a lot of development on their end, obviously a rolling import would be great, because then you're getting that refreshed data. But you've got to kind of like do that

24:41

while maintaining any changes that exist in the OSM data that aren't in NHD. So it's a complicated process, right? Right, right. Thank you. Just wondering about the determination of replacements.

25:04

Was there like an intersection threshold that you use, or anything like that? Or is it, I didn't notice how many there were, if it was enough, you could manually inspect them or anything like that. So replacements are like a pretty difficult thing to find, we found. Because, yes, it is exactly that. It's, you essentially look at OSM IDs

25:22

that exist in one year, versus OSM IDs that exist in the other year, and you look to see if the OSM ID that no longer exists in 2011 was replaced with an OSM ID in 2019, by checking whether it intersects, but not just touches. Because we wanted to find things

25:41

that weren't just part of the network, but were like exactly like, they had to intersect in some way that wasn't just like a node connecting to another node. So as long as some of the area or part of the line intersects, that was probably a replacement. And that was the extent of it, or did you do any further manual inspection of anything? When it was- We did some manual inspection.

26:02

I will say that it's probably our most mixed use case. There's a lot of weird things going on with the way relations are modeled. If a way has its geometry changed, but also kind of gets replaced in a way, it's like, what do you call that, right?

26:22

Is it a replacement? Is it an addition? Knowing what is showing what is kind of difficult. So we kind of attempted to get at it, but there's definitely room for improvement in that department for sure. Cool, thanks.

26:40

Okay, anybody has any other questions? Thanks for listening. Thank you.