Curated Major Map Features Library
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 41 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/58228 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
State of the Map US 202210 / 41
1
4
7
8
14
18
21
22
24
27
28
30
37
39
00:00
Projective planeComputer animation
00:16
Library (computing)Component-based software engineeringDistribution (mathematics)Process (computing)PolygonFreewareElectronic visual displayElement (mathematics)Execution unitSoftware testingType theoryPoint (geometry)Data typeAttribute grammarBlogGeometryAddress spaceInformationMeta elementService (economics)Translation (relic)Sample (statistics)Type theoryDifferent (Kate Ryan album)Natural numberGeometryWater vaporMultiplicationCuboidFormal languagePolygonDefault (computer science)InformationService (economics)Attribute grammarTerm (mathematics)Form (programming)Error messageSet (mathematics)Local ringObject (grammar)MetadataTranslation (relic)Category of beingLevel (video gaming)Point (geometry)Hydraulic jumpLibrary (computing)Computer fileUnit testingIncidence algebraQuicksortProjective planeMultiplication signConnectivity (graph theory)BitUniform resource locatorSeries (mathematics)Discrepancy theoryMereologyProduct (business)ECosProcess (computing)Daylight saving timeNumberDistribution (mathematics)Total S.A.Stability theory10 (number)Order (biology)MathematicsComputer animation
08:01
Sample (statistics)Thresholding (image processing)Form (programming)Library (computing)Attribute grammarFlagGeometryPoint (geometry)Term (mathematics)Type theoryFlagType theoryGeometryMathematicsPoint (geometry)Formal languageMappingLibrary (computing)PolygonCategory of beingWater vaporSet (mathematics)INTEGRALNumberInformationDifferent (Kate Ryan album)Online chatLevel (video gaming)Range (statistics)MetadataCuboidLocal ringResultantRevision controlDiscrepancy theoryPattern languageFile formatMessage passingForm (programming)DistanceError messageAlgorithmPhysical systemAttribute grammarThresholding (image processing)Term (mathematics)Function (mathematics)Goodness of fitComputer animation
Transcript: English(auto-generated)
00:04
All right, everybody, my name is Jake Wasserman from MITA. And on behalf of UNC Charmaine, we'll be talking about a project that we started over the last, I don't know, six months or a year called the Curated Major Map Features Library. Just a quick overview. I'll give kind of an introduction and motivation.
00:22
Then you'll see we'll take over and talk about what we mean by major map features and what features are included here. And then Charmaine will talk about actually how this thing gets used, how we use it for QA, give us an example. So basically, what the Curated Major Map Features Library is is a way of eliminating vandalism and preventing it
00:42
from prominent OSM features and preventing vandalism and errors from showing up in production, making sure that they are safe to display. So it consists of three components that we'll go into some detail on. The first is a data set of OSM elements, so notes, ways, and relations, with manually reviewed tags
01:02
and other properties that we expect to match the planet at all times. There's a series of pipelines that run that compare planet or any data set, any OSM planet, to the library. And it will notify us or alert us of any discrepancies that exist. And then finally, there's a process which can take those differences or discrepancies
01:21
and surface them to manual reviewers who can decide if these are vandalism, if they're errors, or something that needs to change. This runs pretty much every day against the latest OSM. It's part of our daylight map distribution process. And the idea here is to include generally sort of stable features within OSM, so things like major cities, polygons of large water features, landmarks, things like that.
01:42
So just to kind of jump into an example, what the library consists of is essentially a JSON file that has these very specific elements and tags, like I said, that we expect to match the map. So if you look on the right, this is exactly what it looks like for New York City. We're kind of highlighting this famous vandalism
02:01
incident that happened with OSM where New York City was renamed. And I sort of remember when this happened, there were directors reached out to our team and were like, how do we make sure this never happens on our map? And we're like, oh my god, so this is kind of a project that we came up with to solve that. So if you look up here on the right, what this entry says is basically at all times in OSM,
02:21
I expect there to be a node with that ID, and it is expected to have these tags. The population, you can see we'll talk a little bit about can vary. There's a location in there. It has to roughly be in there. But if anyone goes into OSM at any time and changes these things, it will essentially alert us. And the idea is if it gets deleted,
02:41
it will notify us if any tags are modified or deleted or if the feature moves. And there's some other things there. I often describe this as we're just like unit testing the map for major map features. And it turns out this is a great way to prevent this kind of major forms of vandalism. So our goal is here to include major prominent features, things that can avoid embarrassing headlines
03:01
and directors reaching out to our team and maybe threatening the virus or something. You know, mostly stable features. So obviously somebody could really go in and delete the New York City node and kind of recreate it with a new ID and that would cause problems here. But in general, these features are pretty stable. Like I know the New York City node has existed there for 10 years,
03:20
so I think it's pretty safe to say it's not going to change. Sort of a small number of total features, something around tens of thousands. The ID hairs have very high signal to noise ratio. So at any given time, even though maybe we've got 20,000 things, there's really an order of a handful of things that get flagged. And generally when it calls these out, they're like a big deal that we need to really look at.
03:43
So what this really does is it significantly reduces the QA effort that goes on before we finally can launch a map and say it's safe for production. Generally it lets us rest a lot easier before reaching a map release. So with that, I'll turn it over to you. All right, I'm going to talk about
04:02
a typical sort of feature included in the library. So currently the library included three major feature type. Cities, water features, and local landmark features. And for cities, it's mostly referring to global megacities that are cities with awesome tan place as eco cities,
04:22
or tan, and with population greater than two million. For US, we also include cities that has population less than two million, like for example, Fremont in California. For water features, they're mostly inland water features, just like the Lake Biwa here.
04:40
It's also global water features. And for landmark features, it's currently we only have US landmark, and it's like, it could be tourism spots or local, like, significant landmarks like Albatross Islands, painted like this in San Francisco.
05:02
So for each feature type, the data came along with the name labels, important attributes, as well as the geometries. For example, for the major city here it came with the geometry type as point, and has name in multiple language, has place tag, population tag,
05:21
abnormal tag, capital tag, which are the important attributes for cities. And for water features, the polygon type is bounding box, the geometries type is bounding box polygon for water features. And came with name in different language, nature tag, water tag, and other definitive tagging.
05:42
And for US landmarks, so this one, it varies in terms of geometry type and tagging. For some landmarks, it came with geometry type as point, but some other landmarks, it's a bounding box polygon. And of course, it came with name in multiple language.
06:01
And for the tagging, we tried to collect those key information about this landmark feature, so it varies. It could have address, building, amenities, or some other type that are providing key information about this landmark. So for language,
06:21
so for each feature that including in this library, we not only just have the default name and the English name, we also have non-English name. So far, we select the most frequently called language in MetaMap service to include in the library. And we work with our localization team very closely
06:40
to verify and review the name in each language to ensure the translation is correct and proper. So now let's dive in a couple of examples. The first one is Mega City London. So as Jake mentioned, this library is a JSON file,
07:01
so each data, its feature will be stored as a JSON object. And if the JSON object has the metadata like awesome type, awesome ID, the tags, the geometry in WKD format, which is a point for London's here, we attach the category to indicate what feature type it is.
07:22
And for the taggings, as you see this is Mega City, so including the capital tag, the admin level tag, the place tag, population tag, and names also be presented in different language. So for a certain language, we also include various form of the names.
07:44
For example, the name ZH here. On OSM currently is in traditional Chinese character, but in our library, including both the traditional one and the simplified ones, the reason is both values of this name tag
08:00
are commonly used and they're appropriate. And we also notice the pattern on OSM that for any language that has multiple form of, multiple written form of the name, it's inclined to be changed frequently. So that's why we include both format here,
08:22
and when we run the integrity chat, it would pass integrity chat. Another example about certain name tag including multiple form of the value. So this one is Mega City, San Francisco. For the one is name tag here,
08:41
we're including both the English character and the one is Vietnamese character. Even though OSM currently is presented in Vietnamese character, the reason we do that is when this character had been used on OSM, but our localization team confirmed that the English version of it
09:01
is most probably used by local users to represent the city of San Francisco. So we're including both form here, and when we run the integrity chat, it will pass. The other thing I want to point out, I think Jay also mentioned earlier about populations. So you can see, currently on OSM, the population number is different
09:21
from what we had in the library. It's because we applied the 10% threshold, which means that the difference, if the difference between the number in our library and OSM are within 10%, that it would pass the integrity chat until it go above 10% range.
09:43
So here's example about Inland Water value. So this is a big lake in Canada. Similar to Megacity, we also include metadata like OSM type, OSM IDs, tags, the WKT, the geometry in WKT format. It's the bounding box polygon here for the lake,
10:01
and we attach the category to indicate this Inland Water feature. And for Landmark, this example for Landmark feature, San Francisco City Hall. Similar to Water feature and Megacity, we have the OSM type, ID, tags, the geometry, and the category.
10:22
And for the tags, it varies, and it depends on which landmark. So I also attached another example about the Landmark painted lady. Compare this to the San Francisco City Hall, these are the key information to represent this landmark, so we include it in the library, whereas for the painted lady,
10:42
these are the key information to represent the painted lady as the landmark. And you can see that the WKT are different here, like it's a point for painted lady, while it's a polygon for San Francisco City Hall. So this is what's included in the library.
11:00
I will pass to Xiaoming to go over how we use the library for QA. Thanks, Yingxing. So yeah, to ensure that the prominent map features are correct and safe to display, we created a vandalism detection pipeline. So first, we compare an OSM plan file with the library,
11:21
and then we run these three checks, tag, geometry, and uncheck, name check, which, you know, I will talk about in a few slides. And then it will output success and failure results. So once we have that result, we review the failures manually if any discrepancies are vandalism or incorrect, and then we fix them in LiveOSM.
11:41
But if there's new good updates from OSM, then we update the library. So now we'll go into the details of like what the checks are. So the first check is tag check. So this check basically compares to the tags we have in the library and the features on the latest OSM.
12:00
So for example, general court failed the tag check and it detected that the place equals city was removed from the OSM dataset. So this tag removal suppresses the label rendering on our maps. So it is important for us to, you know, get the alert and fix it in OSM ASAP.
12:20
So the next check we have here is geometry check. So it flags when there's a significant difference between the curated library and OSM. So there are two types of geometry checks with a threshold so that the geometries can change day to day, but significant changes will raise a flag. So for polygons, we have a bounding box check
12:42
that has an absolute tolerance of one mile and the relative tolerance of 10%. Here on the left, in the picture on the left, as you guys could see that Lake Orel and OSM grew larger, therefore it was flagged. So we checked out the failure and saw that it was a good change. So we updated our library with the new bounding box.
13:03
For points, we have geometric distance check where it detects if the point move at a tolerance of one mile. So on the right, here's an example of the HETCHE PLACE label. That failed a geometry distance
13:20
because it moved 37 miles away. So we analyzed this error and saw that the labels is in a valid location, therefore we updated our library, which means that we need to also update our algorithm so they can prove our way of checking these. So the last check that we have here
13:40
is unchecked names check. Sorry for the name, but it will make sense, I promise. So a prominent feature gets flagged when a new name in a different language is added in OSM, and is currently not in the library. So these new names added will show up on our maps. So we want a flagging system to detect this. So here's an example of ISMIR PLACE label
14:02
where we only have these sets of language in the library on the left. And then on OSM, these are added more languages. So for this case, we would send it to our localization team to have them review, because it's in foreign language, we can't just validate it with our own,
14:23
like we need our localization team to just review it. And then add it in our curated. So the pipeline runs every day, and it checks tags, names, and dramaturgies, like what you did, Jake mentioned, when it changes in OSM, and then we get diverted
14:42
from that. So here's an example of a failed result that we saw a few months ago, where a version of 54 Fuhan PLACE label was flagged, because the name, English, and Chinese name was removed. So as you can see on the screenshot on the top, on our map, the label disappeared
15:01
because of the name removal. So we needed to check this in OSM ASAP, and just fix it, and then apply it on our maps. So yeah, in summary, curated features library captures validated tags and attributes of prominent map features.
15:21
We use it to compare with latest OSM, and then to detect any discrepancies of tags, and geometry checks, and then we also use this to prevent vandalism of major cities, water features, and U.S. land maps. And then the next thing for us to do is, like I said, improve the threshold of the checks,
15:42
and then continue to expand the library in terms of feature types and languages.
Recommendations
Series of 101 media