Mapping Brands with the Name Suggestion Index
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 70 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/58505 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
State of the Map US 201935 / 70
5
7
14
20
26
28
30
31
33
36
38
40
43
44
51
53
55
56
60
61
67
68
69
00:00
Computer animation
06:56
Computer animation
07:46
Computer animation
11:10
Computer animation
15:43
Computer animation
Transcript: English(auto-generated)
00:01
My name is Brian Housel and today I'm going to share the story of this weird project called the name suggestion index So let's go Before we start We sometimes use acronyms when we talk about this project And I just want to make sure everybody is like on the same page because I really hate jargon and confusing acronyms
00:20
They're pretty unfriendly, but we can't avoid it sometimes so Open street maps right OSM is open street map. It's the reason for this conference I figure you all know this unless you like just like got lost and wandered in here, but anyway It's a free map that anybody can edit so and if I say POI I'm talking about a point of interest So a POI can be anything that people want to put on the map like OSM or visit in real life
00:44
Like a restaurant, school, church, gas station, really anything. So we talk about POIs a lot in open street map and How to get people to add them and whether they're mapped well or not Finally NSI is just this name suggestion index. We call it NSI sometimes
01:02
It's not like a weird government agency or whatever. It's just the name suggestion index if you see me call it NSI That's all we're doing. So show of hands who has actually heard of this name suggestion index project before? Yeah, a few people But how many of you have heard it about before like last year when I started obnoxiously tweeting about it like probably nobody, right?
01:23
It turns out the name suggestion index has been around since 2013 That's like longer than I've been around open street map, which is kind of wild it was created as this sort of side project that spun out of the ID editor and As you might infer from the name It's a list of common named things like POIs that people just want to add to open street map
01:43
For every entry in the index the index contains a name and just like whatever open street map tags go along with it But like why do this, right? Well, ID is really aimed at beginner mappers We want everybody to feel like they can contribute meaningfully to open street map. So the original developers of ID
02:01
Back in 2013 realized that like beginners aren't going to know what all the tags are like nobody starts off knowing what's the difference between Amenity fast food or amenity restaurant or amenity cafe and even we seasoned OSMers sometimes disagree on this stuff so So they decided that you know beginner would just prefer to type McDonald's or Starbucks and just have those tags filled in for them
02:22
And so they collected all the information in index Turn the results into ID presets and it makes the mapping experience a lot friendlier for beginners You know, we can like nudge beginners towards using the correct tags for things So how did we gather these names again? The focus of name suggestion index is really about mapping POIs points of interest
02:41
So we don't want common street names like Main Street or things like that So the original developers decided to narrow their search So they only collect the names for features that are tagged with Amenity leisure shop and tourism That pretty much covers most of the POIs, but you know, not all of them But it's a good good subset and they arbitrarily picked a threshold of like 50 occurrences to consider a name for inclusion into the index
03:07
There's nothing really special about that 50 number It just means that they're not they don't want to gather all these rare names that you're never gonna see in normal mapping They want like brands of things and it turns out Collecting common names and tags of POIs in OpenStreetMap is kind of a tricky problem
03:23
The world is big and messy and humans are very inconsistent So if you look at the real POI data that's been added to OpenStreetMap You see that things are often not mapped like consistently. It's just everybody does it a little bit different There's variations in tagging spelling punctuation capitalization But we believe mostly that it's okay to just pick the most common occurrence
03:43
Right because the majority of mappers probably know what they're doing. So we just picked the most frequently occurring thing But there is a twist Okay, when we talk about gathering the most common named things in OpenStreetMap the name tag itself is pretty messy, too So a lot of people use the name tag Like I don't want to say like incorrectly but maybe not according to how the OSM wiki says you're supposed to use the name
04:06
Tag, right? So maybe they just put a description of the place there because they want something to show up on them on a map They want to render or maybe there's just a bunch of cafes that really are just named cafe Like I don't know or maybe there's places around the world where our Western cultural assumptions about naming just don't quite hold up
04:22
So we can't just gather all the names without doing some filtering to remove generic and unwanted things So all these cafes named cafe So it's 2013. They processed an OpenStreetMap Planet file. They collected all the frequently occurring names of POIs They filtered all the names into keep and discard lists and they released the keep list as name suggestion index
04:45
You can see the kinds of things on here that that ended up in the keep list and the kinds of things that ended up in the discard list so, you know, it's pretty good like And there were around 5,000 names in the keep list and these got turned into the name suggestion presets So high fives all around right success and then we pretty much forgot about it
05:04
So this is the funny thing right for like five years this project got launched and you know Sometimes this happens in open source where a thing is just like good enough. It's not amazing. It's not so bad We have to replace it, but it just sort of like limps along, you know, like for years and years and
05:21
The project had some minor issues All right So like there was a time in 2015 when there were a few duplicated names that we missed Because of variations in in how they were capitalized And then there was a few generic names that we hadn't filtered out So there was some odd suggestions in the ID editor like with the presets like in 2016 You could still like there was this nail salon that was just named nails, right? That's that's not real
05:44
It's just it just happened to be a bug in the name suggestion index And so overall we just like ignored this thing, right? We never really processed the planet file and and I actually think that the scripts to do that were broken So we'd fix these little errors as people would report them but the problem the project just like coasted along without like anybody really maintaining it for a couple of years and
06:05
In the meantime a new way of tagging some POIs was starting to catch on so people were asking us to support brand wiki data So what does that mean? Well a Starbucks with brand tags might look like this, right? You still have your amenity cafe
06:23
You still have name Starbucks, but these tags in orange are what tie each individual Starbucks POI into like the overall Starbucks brand The brand Wikipedia as you might imagine functions as a link between to the Wikipedia page for Starbucks And then this brand wiki data is another tag
06:42
It's a really special one because it functions sort of like a unique identifier in wiki data And these identifiers all look they start with Q and have some numbers. So like Q three seven one five eight It's a letter Q so that you can tell that's a that's a wiki data ID So we can follow that Wikipedia link to learn about Starbucks
07:02
Every Wikipedia page in every language has this wiki data item. It's kind of down in the bottom where I put the arrow and You can follow that and this is wiki data wiki data is Like kind of a treasure trove of linked data It's everything you would ever want to know about the concept of Starbucks right like important dates people links to pages
07:22
images everything Starbucks has a bunch of Wikipedia pages that from all the languages of the world But they all link back to this single record of Q three seven one five eight So that's very convenient for us Right if we could tag all the Starbucks is an OSM with this brand Wikipedia tag Or we could dip wiki data tag
07:41
And then it opens up a lot of possibilities for working with pois that we can't do like with our messy names So I hope this is not a controversial statement But wiki data really is better than using the name tag to find things So like how many Starbucks is pois are in OSM We don't really know because they all have different names. They're in tagged and consistently
08:02
But if they all had this brand wiki data tag, then yes, we could actually answer that question We can compare it to Starbucks as published numbers and see how complete OSM is and then consider internationalization Starbucks does not use the same name everywhere in the world right like sometimes it says Starbucks, but sometimes it's in other languages
08:24
With this brand wiki data we can actually identify all of them We could even use overpass to query and like find them all you know So we could answer those kind of questions about how good we were doing with our poi coverage We can even link to other projects like street care street cred or check-in checker or all the places
08:41
And we can incentivize people to add missing Pois to OSM or to fill in data we can say hey you're at a Starbucks like are these the opening hours right? Because we know that it has that ID And can we even differentiate between different brands with the same name, right? So there's this is a thing that comes up sometimes in poi mapping There's like a USA target and an Australian target and they're completely different
09:04
There's two different supermarkets in the United States called price chopper depending on whether you're in like, New York or Kansas And then you know, we've got two different Aldi's we've got there's this is just a thing that happens Sometimes there's a lot of different banks that go by the name commerce bank So with the brand wiki data, we can actually tell them all apart so that you know
09:21
we're not just going by the name of a thing we're going by something that's a little more unique and finally sometimes brands closed down Or they merge into other brands or they change their names And so if we're tagging them with this brand wiki data Then we know how to find them and we can encourage mappers to update these records either with an approved mass edit or like
09:41
a map roulette campaign So I keep coming back to this sad github contribution graph a year ago in September 2018 I finally decided to resurrect this project. So all the issues that were going on with it We're just like alright, we have to just we have to get it working again So what I was facing was is broken and we hadn't processed a planet file in many years. So I fixed that first
10:04
But I also knew would be a ton of work to do all this research and add these brand wiki data tags to like five thousand or or so entries and To really check them right to actually filter out all the generic words Which we never really did very well in the first time and then even then the index is just like missing a lot of stuff
10:23
Right. Remember this was an index seeded with POI data from 2013 So like how good were the POI's mapped in 2013 like kind of hit or miss But so I had a secret weapon right and and that secret weapon was hacktoberfest 2018 So if you don't know like hacktoberfest
10:41
This is thing that github and a few other companies Sponsor every year and if you sign up you make five pull requests, then you get some free stuff like Shirts stickers just you know some swag So if you are an open source maintainer and you can get your project set up good enough that you know You can have a good contributing guide and you have some issues and the issues are tagged hacktoberfest
11:03
People will just like show up and do your work for you It sounds silly but this is this is really what happens, right? And so I wrote this script to open like 2,000 github tickets And Like I kind of spammed hacktoberfest a little bit and all the issues look exactly like this one right here, right?
11:22
It says we need help deciding what to do with the supermarket named Weiss If it's a real brand and it explains like instructions on you know Go do your research and edit this JSON file. And if it's not a real brand then you we want to filter it out so go do your research and edit this other JSON file and then here's how you rebuild the project and
11:40
You know send us a pull request And I talked it up on Twitter and people showed up and they just did all the work for us This this never happens, but it actually did work. I'm still kind of amazed to this day. There is this one day I think I merged something like 70 pull requests from people and some of you here at this conference, you know I recognize people here you helped out with this right you you did the pull requests
12:03
You got your shirt and you know So I thank all of you so much for for actually helping match all of our brands with wiki data tags And and I also did another thing that seems crazy, but I think maintainers, you know, you should really try this out Everyone who showed up and made a few decent pull requests like past the point where you could tell that they they know what they're doing
12:23
I just gave them admin access to the project. So I just said, okay, you're an admin now And a lot of these volunteers have really stepped up to keep the project going Tim Smith. For example, he's here someplace he regularly is still like processing the planet files and Matthias Konesny and and Tater he's on here too
12:41
He they've written other scripts to actually check how valid the wiki data tags are They've catch all kinds of errors and um, and there's been so many other people who've contributed to this project Like some of the people on this list, you know It's just been amazing seeing something go from like near dead to a thriving self-sustaining project that you know That it's a good success story
13:03
By Early December the name suggestion index project was in good enough shape that we were able to actually include an updated version in ID So this is ID version 2.12 I highlighted the little section there But at this point the name suggestion presets began also setting these brand wiki data tags as users were using them
13:21
And you can see that that the usage of the tag then climbs. It's kind of like exponentially going up So like people are actually You know setting these these brand tags, which is really great um, so one of my goals for this talk is actually to turn everybody here into as passionate as I am about mapping pois and
13:41
And helping out. So this is the project on github. It's it's called name suggestion index, which is I think it's funny It's not a very good name. But anyway, you can star it you can follow it and get involved And I built this little site actually which lets you browse the name suggestion index So you don't have to necessarily be like a programmer and working with JSON files
14:02
You can actually visit this site and you know, we should really put it up on like a good domain name or something But for right now you have to go to github and click there And it looks like this. This is the viewer So all of the categories are listed along with how many brands we've detected and how many of these brands actually have a wiki data
14:21
Link to them and if you dig into like one of the categories like here Then you'll see This is all the data we've collected on the left side It says the name of the brand and there's even these overpass turbo links so you can click on there and you can see where in the world the name is being used and We have a count of how many of the how many of these things are in open street map
14:44
And then they we have the tags it says open street map tags is the list of the tags that we think are the best tags that you know mappers should Be using and if it's been linked to the wiki data tag We can also show you all this information from wiki data over here So it actually has the ID which is that Q number thing and official website and then all these little social media links
15:05
Are also things that we can fetch and if it has a social media like a Facebook a Twitter or you know Some other thing we can actually fetch their logo and we should we show the logos too So if you've been using ID since like maybe May or so in version 2.15 We're now consuming the name suggestion suggestion index with the logos also
15:25
So you'll see like as we click for certain ones which have been Like linked to this extra tag like the logo pops up in the corner, which is kind of neat I think it gives mappers feedback that like oh, yeah, like, you know, this thing is really set correctly and you know They're doing a good job
15:41
So I'm at like, um about 16 minutes now and that was all the slides I had So I could either take questions or I could also maybe go on and we could we could do some queries I know it's kind of like up to what do you want to do? Do you have questions? let's let's open it up for questions first and Then if we run out of that we could actually look through like if people wanted to see how this thing works. So
16:05
Yes Yeah, yeah, yeah, so so the question is about is does it work in Jawsome we we X when we distribute We also distribute Jawsome presets. So that's like a big XML file
16:21
I don't think it's listed on their presets wiki But if you wanted to use them, you could just download it from the github and just use them in Jawsome So there's a few people that do use it that way Anything else No Alright, cool. So here's what I'm gonna do. This is gonna be um, I'm actually good
16:42
This is gonna be completely reckless, right? We're gonna go on to the real scary internet and We're gonna do some like actual searching and stuff We're gonna go here So I've actually got the this page open up Let me open it really big
17:03
There oh, wow, that looks great. Yeah, you can see there's all kinds of stuff in in OpenStreetMap And and this is kind of fascinating like the way that we I know like I look at this project and I you know, I always want to like understand brands a little bit better like what's going on with them, you know, like
17:21
there's all these like You know shopping centers and there's standard shopping center these days has like, you know supermarket and a Walmart or Target and a Home Depot or Lowe's and you know There's this sort of formula for like how retail is is working in America right now and you know, we could actually like someone who likes to tell stories through mapping could
17:44
Figure out like why it is that way and kind of watch like retail sort of also dying at the same time so I don't know I I'm always fascinated by this stuff but But what I want to do actually I guess we could we could go I want to show you like exactly the kinds of Things we do to do these links if we go ice cream, right?
18:03
Ice cream is in here this one Right. So so this is a good example the first thing here is a is a brand called it's actually 33 penguins and it's a Russian ice cream brand and Why don't we look that up? Because this is kind of wild
18:22
Share this with everybody. This might work or it might not No, I don't think it really works in presenter mode
18:44
So anyway, that's not gonna work but but we can talk about brands some more if we go back to that other thing What I was going to show you though is you can you can do some research like Amorino is Is another ice cream brand we can click on maybe this will work. Nope
19:03
No, no, this stuff works because we're presenting probably Yeah. Oh, am I done? Oh Cool so Yeah, yeah Yeah, so if this actually did come up, which is cool
19:22
Anyway, the point I was really trying to make is you can go through this site You can look at all the brands that we have mapped You can look at the wiki data that we've linked it to and you know determine whether or not there's some edits can be made There you can go to the brand's official website And you can like match up their social media accounts like their Facebook their Twitter, you know
19:40
All this other stuff and then as the logos populate in here They'll also populate into the ID editor and then mappers will be happy. So thank you everybody and enjoy your break