Bridging Worlds: Integration of Wikidata and OpenStreetMap
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 156 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/68433 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
FOSS4G Europe 2024 Tartu16 / 156
6
33
35
53
55
59
61
67
70
87
97
99
102
103
104
105
107
111
121
122
123
124
125
126
127
128
134
144
150
151
155
00:00
Digital photographyWikiOpen setLevel (video gaming)Multiplication signLecture/ConferenceMeeting/Interview
00:20
Workstation <Musikinstrument>NumberComputing platformWave packetWebsiteService (economics)Coma BerenicesMereologyFerry CorstenFormal languageStatement (computer science)Instance (computer science)Computer-generated imageryCoordinate systemInheritance (object-oriented programming)Query languageSample (statistics)Local GroupLevel (video gaming)Open setGraph (mathematics)Universe (mathematics)Different (Kate Ryan album)Medical imagingRepresentation (politics)Electronic mailing listIdentifiabilityBitQuery languageFormal languageInformationReal numberService (economics)FreewareInstance (computer science)Link (knot theory)NumberDatabaseComplex (psychology)Execution unitStatement (computer science)World Wide Web ConsortiumMathematicsReverse engineeringPhysical systemGreatest elementMenu (computing)Dot productCuboidView (database)DebuggerResultantComputer configurationInternetworkingMappingSemantic WebField (computer science)WikiReliefSlide ruleStability theoryHydraulic jumpWorkstation <Musikinstrument>Computer animation
05:55
Digital photographyElectronic mailing listLink (knot theory)Physical systemGoodness of fitState of matterEvent horizonProjective planeOpen setLevel (video gaming)Diallyl disulfideComputer animationLecture/ConferenceMeeting/Interview
06:55
Point cloudData structureGamma functionReading (process)Petri netWeb pageBoundary value problemAbelian categoryTexture mappingLocal ringSet (mathematics)InformationElectronic mailing listPhysical systemMathematicsSoftwareDescriptive statisticsPersonal identification numberMatching (graph theory)Dot productWeb pageDigital photographyLink (knot theory)CuboidResultantTerm (mathematics)Greatest elementType theoryVirtual machineUniform resource locatorWikiOpen setSymmetry (physics)BitLevel (video gaming)Mobile appStreaming mediaPolygonComputer fontComputer animationLecture/Conference
09:35
Matching (graph theory)Address spacePhysical systemType theoryAddress spaceMatching (graph theory)IdentifiabilityComputer animation
09:57
Statement (computer science)Matching (graph theory)WebsiteCodeWorkstation <Musikinstrument>Electronic mailing listStatisticsService (economics)Uniform resource nameString (computer science)IdentifiabilityCodeFormal languageKey (cryptography)Endliche ModelltheorieMatching (graph theory)CASE <Informatik>Instance (computer science)CodeNormal (geometry)Moment (mathematics)Category of beingLevel (video gaming)Symmetry (physics)Equaliser (mathematics)WikiOpen setPhysical systemComputer animation
11:48
Link (knot theory)Library catalogSocket-SchnittstelleService (economics)Query languageDigital photographyDatabaseOpen setCategory of beingRule of inferenceWeb pageMedical imagingWikiPermanentTerm (mathematics)Local GroupImage resolutionComputer fontBuildingInflection pointPartial derivativeFinitary relationCategory of beingWikiDebuggerCuboidSoftwareInformationMultiplication signRule of inferenceQuery languageLink (knot theory)Open setDatabaseFormal languageHypermediaPublic domainMobile appStreaming mediaLevel (video gaming)Right angleDifferent (Kate Ryan album)View (database)Point (geometry)Physical lawSource codeOpen sourceIdentifiabilityPhysical systemDigital photographyMedical imagingStability theoryWebsiteBootstrap aggregatingBuildingPermanentMathematicsService (economics)PolygonDirection (geometry)ExistenceComputer fontLibrary (computing)Standard deviationMultiplicationComputer animationLecture/Conference
17:31
Link (knot theory)WikiReading (process)Web pageImage resolutionPermanentFrequencyGeometryComputer fontBuildingFinitary relationLocal GroupInflection pointHausdorff spacePartial derivativeStatement (computer science)Open setNumberStatisticsMathematicsZoom lensCollaborationismType theoryLoginAreaAddress spaceArchitectureData structureComputer-generated imageryVertex (graph theory)Execution unitLevel (video gaming)MetrePolygonControl flowMultiplication signPoint (geometry)Revision controlLink (knot theory)Open setPersonal identification numberStreaming mediaStatisticsWikiSoftwareMobile appConnected spaceStructural loadIdentifiabilitySource codeDifferent (Kate Ryan album)Projective planeUser interfaceTheory of relativityState of matterMatching (graph theory)Position operatorPresentation of a groupRight angleLimit (category theory)Electronic mailing listMoment (mathematics)Service (economics)LoginDirection (geometry)Intrusion detection systemQuery languagePhysical systemFigurate numberHypermediaCoordinate systemSlide ruleComputer animationLecture/Conference
23:15
Least squaresComputer-assisted translationLecture/ConferenceComputer animation
Transcript: English(auto-generated)
00:02
I'm going to be talking about Wikidata and OpenStreetMap. So just about me, I'm a hobbyist, I'm not here representing, I don't work in geospatial, I'm not here representing an employer, and I've been doing OpenStreetMap and Wikipedia for a long time. So I think we all know what OpenStreetMap is, so I'm not going to talk too much about
00:25
OpenStreetMap. But then Wikidata, I think maybe people here might not be familiar with Wikidata. So Wikidata is a collaborative built knowledge graph. So what am I talking about? If you go and look at Wikipedia, I come from Bristol in the UK, so this is the main railway
00:43
station in Bristol on Wikipedia, and there's a few bits and pieces here. You've got an option to change the language of Wikipedia articles, so this article is available in 14 languages, and if you open that, you can scroll through and you can see the different languages. So that information that links the different languages has to be stored somewhere.
01:04
And the other piece to look at on this article is the info box. There's a box on the side of most Wikipedia articles with a lot of information, so this article's got that. And that information has to be stored somewhere. So if you go up to the tools menu in Wikipedia and you click the tools menu, open it, then
01:21
down at the bottom of the tools menu, you'll find a link to the Wikidata item. So jump over, this is Wikidata. So Wikidata's run by the same people as Wikipedia but the Wikimedia Foundation. Every article on Wikipedia has a Wikidata item, but there's actually more Wikidata
01:40
items. So there's English is the biggest language, Wikipedia, and there's six million articles in English, Wikipedia, but there's a hundred million Wikidata items because people have been adding more things with imports and so on. So if we have a look at this, we've got the, on the Wikidata item, we've got the links to the article in all the different languages.
02:03
So before Wikidata existed, this information used to be stored within Wikipedia and someone would translate the, write a new article about the same topic in a different language and you'd have to go and edit all of the other languages to add the language links in. So much better to move this information to Wikidata.
02:21
And then the other pieces from languages we've got are statements. So here you've got the statement saying that this is an instance of a railway station, you've got an image, countries, United Kingdom, and all of these things are linked through to other items. They're not just free text fields, so you can go through and find out more information
02:44
about the items it links to. But the thing that we're excited about is the coordinates, because we like maps. So things that are real world things will have coordinates in Wikidata. And there's some more information you can see there about the item.
03:03
The other piece that is very useful is the identifiers. So Wikidata likes to collect identifiers, you know, ways to represent things in different places. I like to think of Wikidata as a bit like the Rosetta Stone of data, because you've got different databases on the internet and you can look up an item using an identifier,
03:24
find the Wikidata item, and then you can go off and find the same thing in a different online database because you've got the identifier from Wikidata. So all very useful stuff. And then the thing to note on Wikidata is the identifiers. Everything on Wikidata is identified by a number starting with Q.
03:46
So this item has a Q number and so does everything else. And that doesn't change. That's like a stable identifier. So that was the nice front end where you can look around things and you can edit.
04:05
But then when it comes to querying, we get a bit more complicated. So Wikidata has a query system using a language called Sparkle. This is the Wikidata query service. And you can write some complex queries. So I've got here a Sparkle query to find churches in Tartu.
04:24
And you're not supposed to understand this. It's a bit like writing SQL but not quite the same. It's a W3C standard for the semantic web. So I've got, this is saying, find churches in Tartu City. Give me the coordinates and give me any images that you've got.
04:46
So I've got 17 results here. And you get a list when the churches were built and where they are. But you can also change the view. So you can say, show me an image grid. And you get the pictures of the churches. Or the Wikidata query service has a map view when we like maps.
05:04
So there's a map with some dots of where the churches are in Tartu. So Wikidata's a great system. But what we really like is to be able to link together. Oh, no, I've got, sorry, I've forgotten this slide. Yeah, so we'd like to be able to link together OpenStreetMap and Wikidata
05:23
and be able to look up things in OpenStreetMap and find them on Wikidata and the reverse. But just one last thing about SPARQL. This is quite a complex SPARQL query, just a bit of fun. And this is impact craters named after choreographers.
05:41
So Wikidata knows about all the impact craters in the universe. And it knows about what they're named after. And here's a list of all the impact craters named after choreographers. So I went to State of the Map EU in 2014 in Karlsruhe in Germany.
06:05
Similar to this event, the main State of the Map, the OpenStreetMap conference was going on in South America. So they did a conference in Europe. It was a nice town. I like Tartu's nice. An exciting conference, good people, good talks.
06:22
And there was a hackathon at the end. They were like, let's have a hackathon, work on some ideas. And I looked at the list of ideas. And someone said, I've got this idea for writing a system to find links between Wikidata and OpenStreetMap. And I said, yeah, that's a great idea. Let's meet up and work on this project.
06:41
And this was the guy, Andy Mabbott, who's a Wikipedian. And he wasn't even there. He posted the idea and was like, yeah, I think someone should work on this. So I didn't get to meet him at the conference. But I thought, this is a good idea. So I started building it. So this is my software I've built.
07:01
You go to osm.wikidata.link, put in the name of a place, and hit Search. And the system will think for a bit. And then it will try and find matches for you between Wikidata and OpenStreetMap, things that are in both systems that match up. So this is Tartu. The blue dots you can see are things from Wikidata.
07:27
And then if I scroll down, you can see it's got a list of all the matches it's found. If I zoom in, so this is a cemetery in Tartu.
07:42
The red pin is where the coordinates are from Wikidata. And the blue outline, the plug-on, is from OpenStreetMap. So these are probably the same thing. I've got a tick box that I can tick to say, yes, these two things match. And then just to help you, I'm showing you bits and pieces.
08:03
I've got the photos are coming from Wikidata. I've got the name of the thing coming from Wikidata and the description from Wikidata. And I show you the first paragraph from Wikipedia just to help you out. And then down here you can see that it says it's got an exact match in terms of the location.
08:23
Like, you know, they're overlapping. And it's saying that the names match. So, you know, probably it's the same thing. So the idea is you go through each of these results and you check them. You know, you're making sure the machine hasn't made a mistake. Oh, yeah, this is item type cemetery.
08:42
So it's matching on the fact that it's a cemetery in both systems. And then if we scroll down to the bottom of the page, there's a button at the bottom that says add Wikidata tags to OpenStreetMap. And you click that link and it takes you to a confirmation page.
09:00
So this just shows you the same information again but in a different style. So say we're going to save this information to OpenStreetMap. So it'll make a change set on OpenStreetMap with these edits. It shows you the list of the edits it's going to make. So it's found 91 matches that it's ready to upload.
09:20
And it automatically generates a change comment, you know, like a description of the change that's going into OpenStreetMap. And you can edit that if you want. And then when you're happy, you hit save and it uploads the changes, the tags to OpenStreetMap. So the criteria for matching is it's going to be the same entity type.
09:42
Like if it's, you know, a railway station in one system, it's going to be a railway station in the other system. It's going to have the same coordinates, like, you know, got to be close together. And then it either has to have the same name or the same street address or an identifier. So I'll talk about some of those things. This was my example I was using was the cemetery in Tartu.
10:03
And if we look closely on here, it says it's an instance of a cemetery. So I can click on cemetery and this is the Wikidata item for cemeteries. And if we scroll down on here, then there's a property in Wikidata for OpenStreetMap tag or key.
10:22
So this is trying to say how a cemetery would be represented on OpenStreetMap. Now, there's two ways of representing a cemetery on OpenStreetMap. But Wikidata knows that the main one is land use equals cemetery. So that's how we can match up and make sure that the thing is the same thing.
10:42
I've got an example here of some of my name matching. I'm doing normalization for the name matching. So I don't look for an exact string match. I've got lots of, you know, a lower case things and I strip out punctuation. And then I also try reordering the names. I'm not using any kind of large language models or other AI yet.
11:05
Like I've tried to see how far I can get with the name matching without using AI. I might add AI at some point, but this is working for me at the moment. And then I've got some extra code for handling church names because there's a lot of churches
11:22
and there's different ways of writing churches. So, you know, special case that and I pick up a lot more matches. And then the matching identifier. So railway stations often have a code that represents them, things like airports.
11:41
And these codes are in both systems. So I can match on any of these identifiers. But, you know, an important question is why bother? Why add these links? And there's various reasons. Wikidata tends to have the names of things in more languages than OpenStreetMap.
12:02
So, you know, useful for if you're using the data in OpenStreetMap, you can get the labels in more languages from Wikidata. And linking into Wikipedia articles. Like people add the links to Wikipedia from OpenStreetMap. But, again, you've got the problem with multiple languages, which article do you link to.
12:25
So if you just link to the Wikidata item, then you can get from that to, you know, whichever language article you want on Wikipedia. Wikimedia Commons is the Wikimedia Foundation's place for storing images. So there's lots of photos available.
12:42
If you need photos of things on OpenStreetMap, you can pull them in from Wikidata Commons. And like I said with the identifiers, the identifiers like, you know, Wikidata you could think of as like a Rosetta Stone for identifiers. So it's great to be able to provide those identifiers to OpenStreetMap. And then in the other direction, Wikidata and Wikipedia benefit from getting the polygons from OpenStreetMap.
13:07
When you look up things on Wikidata, you can see the outlines. And that data is being pulled in from OpenStreetMap. So both sides benefit from these links existing.
13:22
So the software is built using Python. I'm using pretty standard Python libraries. And then the front end obviously is written in JavaScript. I'm using Leaflet and Vue.js and Bootstrap for the front end. And there's a bunch of APIs I get to use.
13:41
So the OpenStreetMap geocoding API is nominatum. I use that a lot. The OpenStreetMap Overpass API I use to pull data from OpenStreetMap. And then the Wikidata MediaWiki API and the query service. The sparkle that I was showing you earlier.
14:01
I run sparkle queries to pull things from Wikidata. And I have some problems with the Overpass API and the Wikidata query service. They both time out. I do quite big queries. I ask for big bounding boxes for information. And sometimes they say this is too much information you're asking for.
14:21
And then I've got software which splits the bounding box into four. And then requests again. And then that usually works. Sometimes one of those fails, splits into four again. So it can be a bit annoying. But that's all handled kind of automatically. So those are the APIs I'm using. And the source code is open source.
14:43
And it's available on GitHub. Yeah. So there's the website again. So I have got some difficulties. And the difficulty is, one of them is different licenses. So Wikidata is licensed CC0.
15:03
You know, public domain, do what you want with it. Whereas OpenStreetMap has got its own license which is the open database license. So because the licenses are incompatible, you can't copy data from OpenStreetMap to Wikidata. Because it would violate the license. And so, I mean, that's okay. I'm not doing that.
15:23
I'm just adding the links. But it's something to be aware of. But it's even more tricky than different licenses in that it's different intellectual property jurisdictions. So OpenStreetMap asserts database rights which is just a European concept in intellectual property.
15:42
And Wikidata uses U.S. intellectual property rules where facts are not protected in U.S. law. So, you know, there's a feeling that Wikidata is a collection of facts that you can't really protect any of that. Whereas OpenStreetMap takes a different point of view and says that the facts are protected by database rights.
16:03
Because it's taken a lot of effort to collect this information. And so there should be some intellectual property protection. And there's a feeling when some in the OpenStreetMap community that Wikidata is a derived work of Google Maps. People have been going and finding the coordinates of things and putting these coordinates into Wikidata.
16:22
And maybe it's got, you know, 100,000 coordinates that have been copied from Google Maps. So if we were to try and copy any of those coordinates into OpenStreetMap, then, you know, we'd be making a derived work of Google Maps. So luckily we're not doing that. We're just adding the links so it's all fine.
16:42
But it's something to think about for people reusing the data. People who are pulling the data from both systems and combining it, you know, maybe there's problems there. And another problem we've got is stable identifiers. For a long time, OpenStreetMap, like to say that the identifiers for things were not stable.
17:03
There's a Wiki page on the OpenStreetMap Wiki about permanent ID saying we should invent a stable identifier. The idea was maybe you'd map a railway station as a point and then later someone would column along and map it as a polygon around the building. And when that happens, the ID changes.
17:22
But the reality now is that most of the world is mapped in OpenStreetMap and the identifiers aren't changing. So they're pretty much stable identifiers and you can use them and you can link into them. And so because for a long time there weren't stable identifiers, the links didn't exist in the other direction.
17:43
You couldn't link from Wiki data into OpenStreetMap. They've changed that now and you can do those links. And then within Wiki data, Wiki data is supposed to have stable identifiers but it turns out that they're not stable. There's loads of duplicates in Wiki data. People have loaded data from different sources.
18:01
The same church has appeared in the system twice. You end up merging the duplicate data and one of the identifiers gets replaced with a redirect. So there's loads of redirects now that are pointed to from OpenStreetMap and those need to be fixed. So just a few things that need working on. Sorry, this slide is what I was saying about the OpenStreetMap IDs have been added recently to Wiki data.
18:28
And I need to change my tool. My tool at the moment only adds the links in one direction. It adds the links from OpenStreetMap pointing to Wiki data. But I need to change it to add the links in the other direction as well. And it's going to make the user interface more complicated.
18:42
I'm going to have to make people log in to both OpenStreetMap and Wiki data to edit both of them. But it's on my to-do list and I'll figure it out. So people are using the tool. There's lots of Wiki data tags now in OpenStreetMap. There's over 3 million Wiki data tags.
19:02
The links have been added. These are some of the people who are using my tool. And here's some stats. I've got almost 500 users. And about 25% of the tags that are in for Wiki data that are in OpenStreetMap is done using this software.
19:22
So trying to fix some of these things. I've got a new version that I'm working on. This will load faster and will show you a big map. You zoom in. You see some pins. You can click on pins and start linking things straight away. Here's an example of what the new version is going to look like.
19:44
Again, big map and more data. So yeah, that is my thing. Any questions? Thanks so much, Eduardo, for your presentation.
20:02
Any questions from the audience? Only one question about the coordinates. So Wiki media has only latitude and longitude, nothing else, right?
20:23
Yep. Okay. And there is also the match of like a geographical match between the features. Yes. So if the point is like within, like the church is within a relation or... Yeah, or if it's two points and they're within 100 meters of each other, then it'll match.
20:42
It's just if things are close together. It doesn't have to be a point within a polygon. It can be two points or it could be the point, you know, close to the polygon if it's within 10 meters of the polygon, then that'll match. Okay. And if I want to query OpenStreetMap by overpass, could it be better to query by that kind of SQL?
21:10
You can't really use sparkle to query OpenStreetMap. There was a service for a while where you could use sparkle, but I think it's not running at the moment. You know, it's probably best to stick with overpass.
21:23
Overpass. Yeah, yeah. With all the limits of overpass. Yeah, thank you. Thanks. We have three more minutes for questions.
21:41
Okay. Thank you, Edward, for the talk. Wikidating OpenStreetMap has a really long history, and I remember at the start, not many people in OpenStreetMap played that connection. But I feel like it's changing over the years. So what's the state of community relations right now? I think it's okay.
22:01
Like, I'm not watching too closely. When I started this project, I didn't build the user interface. I was just matching things automatically and putting them in OpenStreetMap, and people were quite upset with that because I had some false positives. So, you know, it's trying to be careful by getting people to check things, and I think the OpenStreetMap community is okay with that.
22:28
Like, I don't know is really down, so I'm not sure how the OpenStreetMap community feels. You know, just everyone is being sensible and, like I say, not copying data between the systems
22:43
that people don't want making new things in OpenStreetMap based on Wikidata or, like, oh, it's in OpenStreetMap. Let's just copy all of this thing into Wikidata. People aren't doing that, so, you know, I think the communities are getting along okay.
23:02
Thank you. Okay, we have time for one last question before we move to the next presentation. If not, we can finish early and then have a little break. All right, thank you, everyone. Thank you, everyone.