Linking OpenStreetMap and Wikidata

Video in TIB AV-Portal: Linking OpenStreetMap and Wikidata

Formal Metadata

Title
Linking OpenStreetMap and Wikidata
Subtitle
A semi-automated, user-assisted editing tool
Title of Series
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
2019
Language
English

Content Metadata

Subject Area
Abstract
Wikidata and OpenStreetMap are collaborative open data projects that contain structured data for real world places and things. Adding links between the projects makes the data more useful, but doing this by hand is laborious. I've written a software tool that automates much of the process. Editors of OpenStreetMap can use my software to search for a place or region, generating a list of candidate matches from Wikidata, which can then be checked and saved to OpenStreetMap. Linking the two projects isn't without controversy. They use different licenses which raises questions about what information from one project can be copied to the other. I will talk about the benefits of linking, the process of finding matches, the community response - including the controversy - and how people can get involved.
Loading...
Wiki Matching (graph theory) Link (knot theory) Software Mapping Database Bit Open set Object (grammar) Mereology Twitter
Web page Mapping Link (knot theory) Login Database Hyperlink Open set Login Total S.A. Physical system
Wiki Personal identification number Type theory Matching (graph theory) Mapping Computer configuration Personal digital assistant Sheaf (mathematics) Open set Physical system
Web page Personal identification number Default (computer science) Matching (graph theory) Link (knot theory) Polygon Electronic mailing list Wiki Inclusion map Goodness of fit Mathematics Cuboid Object (grammar) Physical system
Link (knot theory) Matching (graph theory) Mapping Link (knot theory) Set (mathematics) Open set Coordinate system Wiki Uniform boundedness principle Mathematics Set (mathematics) Physical system Address space
Workstation <Musikinstrument> Graphics tablet Matching (graph theory) Identifiability Code Workstation <Musikinstrument> Code Bit Coordinate system Personal digital assistant Normal (geometry) Address space Physical system Address space
Workstation <Musikinstrument> Wechselseitige Information Standard deviation Key (cryptography) Mapping Point (geometry) State of matter Code Electronic mailing list Generic programming Coordinate system Infinity Number Wiki Category of being Uniform resource name Website Abelian category Amenable group Physical system
Wiki Web page Type theory Category of being Mapping Convex hull Instance (computer science) Open set Physical system
Web page Polygon Complex (psychology) Service (economics) Structural load Code Database Matching (graph theory) Coordinate system Semantics (computer science) Formal language Wiki Object (grammar) Query language Cuboid Amenable group Address space Physical system User interface Pairwise comparison Service (economics) Key (cryptography) Mapping Structural load Polygon Electronic mailing list Bit Formal language Type theory Process (computing) Query language Resultant
Standard deviation Centralizer and normalizer Bootstrap aggregating Event horizon Software Debugger Query language Energy level Discrete element method Formal language Physical system
Area Web page Physical system
Area Service (economics) Dialect Matching (graph theory) Service (economics) Multiplication sign Interface (computing) Electronic mailing list Hyperlink Bit Web browser Wiki Software Query language Query language Musical ensemble Freeware Window Resultant Physical system
Wiki Execution unit Data model Wiki Hyperlink Database Mereology
Web page Identifiability Link (knot theory) Multiplication sign Electronic mailing list Hyperlink Bit Library catalog Formal language Wiki Mathematics Uniform resource locator Statement (computer science)
Wiki Digital photography Identifiability Different (Kate Ryan album) Library catalog Library catalog Formal language
Link (knot theory) Identifiability Link (knot theory) Library catalog Hyperlink Library catalog Formal language
Rule of inference Mobile app Matching (graph theory) Mapping Physical law Hyperlink Database Public domain Database Open set Streaming media Open set Wiki Category of being Different (Kate Ryan album) Google Maps Personal digital assistant Right angle Category of being Physical system
User interface Area Robot Rule of inference Identifiability Link (knot theory) Direction (geometry) Moment (mathematics) Rule of inference Virtual machine Wiki Mathematics Intrusion detection system Physical system
Web page Building Greatest element Identifiability Set (mathematics) Insertion loss Instance (computer science) Mach's principle Wiki Mathematics Computer configuration Local ring Error message Physical system Personal identification number Multiplication Matching (graph theory) Mapping Code Bit Hyperlink Twitter Inclusion map Uniform boundedness principle Type theory Category of being Uniform resource locator Website Local ring
Wiki User interface Inclusion map Thetafunktion Software Closed set Hyperlink Website Text editor
Wiki Building Software Code Multiplication sign Normed vector space Login
Code Direction (geometry) Workstation <Musikinstrument> Zoom lens Hyperlink Open set Mereology Software bug Wiki Mathematics Computer configuration Series (mathematics) Office suite Physical system Injektivität Area Theory of relativity Touchscreen Mapping Software developer Moment (mathematics) Electronic mailing list Bit Hyperlink Connected space Category of being Type theory Linearization Configuration space Convex hull Hill differential equation Arithmetic progression Point (geometry) Trail Server (computing) Identifiability Computer file Branch (computer science) Streaming media Web browser Goodness of fit Centralizer and normalizer Bridging (networking) Touch typing Execution unit Matching (graph theory) Key (cryptography) Physical law Polygon Database Line (geometry) Wind tunnel Personal digital assistant Musical ensemble Object (grammar) Routing Library (computing)
Point cloud
okay okay hello everyone yeah I'm Edward I'm an OpenStreetMap er I've been mapping for 15 years and I'm part of the Wikipedia community as well I mean had been on english-language Wikipedia so I'm gonna be talking today about a tool that I've built for machine-assisted matching of Open Street Map and wiki data are items so and we all know open truth map I'm sure wiki data is wiki data is is a big database full of things including geographical objects with coordinates so it'd be nice if we can match them together and then I add links to Open Street Map that take you to wiki data so I'm going to show the the software that I've built I'll talk a bit more about what wiki data is after that so here's a screenshot this is the
center of Brussels around the Grand Place so this is I've done a search and I've the the system is run it takes a few minutes and it finds some suggestions for things that match between the two different databases that you might want to add links so if I scroll down the page I get a login button so I hit the login button and that takes me to Open Street Map and I log in with my OpenStreetMap credentials
using the Open Street Map ooofff I come back to the same page and
then if I scroll down the these are the suggested matches so if I got this this first section of a match is the data that comes from wiki data in Wikipedia so you've got the label you know how it's known in wiki data and then item types and I know the extract is taken from Wikipedia so in this case it's just there's only a French Wikipedia article so the the extract is in French and then underneath that we've got the suggested matching Open Street Map item so the system thinks that these two things are the same thing over here we've got the the map the the blue pins are where the wiki data item coordinates are so we've got this option show on map and that that will zoom in on one of these matches so you can see it so I'm going to use the Maison du Roi
in the ground places by example and and so this is zoomed in you can see on the
map the the red pin is the selected wiki data item and there's a blue border like a polygon around the OpenStreetMap object and the system is saying this is this is an exact match so if we like a match then we'd take the box next to the item to say that it's good so we can go through all of these and check that they're valid once we're happy then we we can click Save button to add the links to OpenStreetMap get confirmation page where you just see the same list of
matches again the system's like are you really sure you want to save these there's the list of matches and then you get a change comment you can set the comment for your change set it makes one up by default but you can edit it you hit save and then it will save these links into OpenStreetMap so Wow people are using the tool so 140 users
have used it and you can see they're six and a half thousand change sets almost a hundred and eighty thousand wiki data links added to Open Street Map so the the system uses these matching criteria
for deciding if if something is a match between the two systems here is is an example here's a pub so you know
antitype coordinates the system looks at the name and sees if the name is the same like it does some normalization on the name it lower case is the name and removes and and bits and pieces like that and if it can't match on the name then it'll try matching on the street address so I've got some more examples this this is
Paddington Station in London so it'll look at the the identifiers the the station code which is in both wiki data and OpenStreetMap and in fact there's lots of identifiers that are compared so
these identifies all have a key that they appear in OpenStreetMap and a property in wiki data that I can use it to match on here's another example here's a lighthouse and all
lighthouses have a standard reference number that can match on so I'm going to
talk about this is a Theatre in the center of Brussels you can see like there's the pain of weather theatre is and the system knows from wiki data that it's a theatre and it knows that OpenStreetMap uses amenity equals theater to represent theaters but but how do we get that mapping between the two so if we have a look at wiki data here's the the theater on wiki data and
it's got instance of theatre there's like a type system in wiki data so we know this is a theatre and then if we have a look on the theater page you can see there's a property within wiki data for osm tag okay so wiki data knows Open Street Map tags and that's the how the system can figure out you know the theater how a theater is represented on Open Street Map so the the important
thing if you want to wipe with wiki data is you need to use the wiki data query service like in the background my system is using this there's a user interface that you can look at and you can try out and the queries are written in sparkle which is a semantic query language this example is theatres in Brussels and sparkles kind of complicated like you don't have to know this use a tool but it's just it's very useful if you want to work with wicked ad you should figure out sparkle and this this example query I've got here is the theatres and Brussels you can see so I can i can use sparkle to have a look at the OSM tag key within wiki data as well so this is a search for amenities within wiki data and you can see it's found a list of the the various kind of types of things that are amenities in the OSM key that goes with them so this is kind of kind of searches that the system is doing underneath like much more complex sparkle queries and on it the sparkle supports bounding box searches which is important for a geospatial data so that that's the bit of the code that you use for searching in a bounding box I just talked a bit about how the the matcher
runs like you do a search for a place and it gives you some search results and you pick from one of the search results the search results are coming from the gnomon artim api which also gives us the polygon for the thing and then once we've got the polygon we can figure out the bounding box and go and ask wiki data for items within that bounding box and that we also grab the first few paragraphs of text in every language so that we've got the excerpt to to show on the the comparison page and also to get the the street addresses which often appear in the in the excerpt so this is built with WebSocket like the user sits looking at this map for a minute or two while it's it's doing some processing in it it shows you the status as it's updating so the next step is it goes off and searches open the OpenStreetMap overpass API to find matching items within the bounding box and then it loads all of that data into postgis to be able to do the comparison and then it runs the matching process to try and find things so just if anyone's
interested this is the stack that I've used to built it's all written in python with flask SQL alchemy I'm using leaflet and bootstrap on the front end so I'm
just going to talk about some of the other features in the software like one of the problems I had was what language should I use for showing labels in like there isn't a standard there is an easy way to find what language you know is the preferred language for a particular country and even country level isn't useful because sometimes it varies by region so the system tries to guess what language to show the labels in I can in in central Brussels it's decided to use Dutch as the top one because the that has the most labels and then French and then English but if we don't like it we can change it so there's an edit button and you can drag and drop to reorder so
maybe I can drag them you know switch them around if I want to change it I'll show you some more features the the the
system detects that the center of Brussels is quite small and complains and says I might want to choose a larger area so in n it gives some suggestions for bigger areas that I might want to search on equally if I try searching for
Belgium it'll work but it's it's big for the system like so if I click on Belgium you get to the the page where it
runs and tries to find the the matches but it's too big so the the system splits it up into chunks like if you just try and do the whole of a country at once then you'll get a timeout from the wiki data query service and from the OpenStreetMap overpass service so I split the area into chunks and I do them one at a time and then recombine the results and even with the chunking sometimes I hit timeouts so the system detects when I had to timeout and then splits it into four chunks and retries so so this is one approach for doing large areas the problem you'll have with this is the the list of matches will have like ten thousand items on it that you've got to go through and check and there's no kind of bookmarking where you can just do half and then come back later like if you leave the browser window open it will work but so so it would be better if we had a different approach which is the we can use the
Browse interface so if this link here for browsing if I click browse then I get a list of the sub regions within Belgium and I can zoom in on these like so if I click on brussels-capital region then it then I get these are all the municipalities of Belgium people who are sitting on the stairs or ask you to leave because you're blocking one of the emergency exits so either take one of the free seats if they're still available but people sitting on stairs and standing here kindly request to leave the lifestream is on and it is [Music] sorry okay thanks so that that's most of the features I want to talk about in the software I'll just talk about wiki data a bit for people who aren't familiar
with wiki data it's a database of structured data run by the Wikimedia Foundation the same people as Wikipedia it's been around since 2012 and and why do we want to do this it's the other
question so I'm going to use the ground places as but my example is the ground place on Wikipedia and it's got here's a
link to take you to wiki data is here's the the wiki data item that represents
the ground place so we get lots of links to Wikipedia there's 50 languages there's articles written about the ground place which is useful this this is the main chunk of a wiki data page you get a list of statements this is a bit like tags in OpenStreetMap like you know key and value and then this is the the key thing for referring to a wicked a page they all have a unique identifier they starts with a Q followed by number and that appears in the URL as well and our wiki data identifies a permanent and stable they won't change over time when something gets renamed so they're a useful way of linking in to a catalog and this is what it looks like when you look on
OpenStreetMap at the ground place you can see it's got the wiki data link in there is a tag so again what what do we get from wiki data we get a link to
Wikimedia Commons like if you want photos the ground place is over 200 photos we'll get some more labels like
you can have the the name in different languages like more labels than appear in OpenStreetMap and we get some external identifiers so
the wiki data has links into free the freebase ID and it has the geonames ID or the World Heritage Site ID you know all very useful like just by having the wiki data link we've got linked into these external catalogs so just to recap this is you know what
we get labels in more languages and links to Wikipedia and links to Wikimedia Commons a dentists identifiers for other data catalogues so this is a good thing but you know there's people
adding the tags by hand to OpenStreetMap but it's it's time-consuming and so that's what I thought it'd be good to automate it but there's also some difficulties trying to link the two
systems like the licenses are different wiki data is CC 0 which is like public domain where is open stream app users it's so nice since the open database license and so you can't copy any data from open stream map into wiki data because that because of the difference in licensing but it even gets worse than that like the they use different
property during intellectual property jurisdictions like Open Street Map uses a search database rights and the European law and Wikimedia Foundation is is cleaner on u.s. intellectual property law which says that things like cop coordinates of facts and they're not protected by intellectual property so there's people within the OpenStreetMap community are a suspicious of weather coordinates in wiki data come from they they question whether a lot of them were copied from Google Maps like people look at where something is on Google Maps get the coordinates put them in wiki data in which case does that make wiki data or a derived work of Google Maps but I think that these problems don't really affect this tool because I'm not copying any data between the systems like I'd use the coordinates to open find the matches but I the only thing I'm doing is adding the link so the my first attempt at this was like
a fully automated system where I was just uploading tags without checking first and that you know was against the rules people are unimpressed I had a role account doing that which got blocked so better to have the the user interface where people can check things and also local people can check things in their own area like it's it's not just me trying to do the whole world so yeah machine-assisted editing is good what about adding links in the other
direction it would be nice to put legs in wiki data that point at openstreetmap now that is difficult because OpenStreetMap doesn't have stable identifiers like this this is the URL for the grande place and you see it's got an ID in there that ID isn't guaranteed to stay the same like someone is free to come to OpenStreetMap and and and redraw the ground place maybe in in finer detail and the ID will change and there's been discussions with our OpenStreetMap about adding permanent IDs that don't change but those would be going on for years and it still doesn't have permanent IDs there they're quite permanent like you know this probably won't change but not quite permanent enough for us to start putting them in wiki data so you know we just have the links going in one direction at the moment yeah so just another screenshot
of of the tool and and that's mostly it
I'm just going to do I'll do a live demo and see if this works so this is the
page that I was just describing it's called at the top like I've got English is the preferred option it's still you know not all of its in English because but here we've got the name of a pub called the king of Spain and it's come up in English because I English selected I can click show tags and it and it shows the the tags that represent this the building equals yes is is highlighted because it's got building over here so you know that's that's the matching type this one actually matches on identifier so none of the names match perfectly the names are a bit all over the place but it's got this website address here which matches this website address here so this website is from wiki data and this is OpenStreetMap and it's managed to match it I can do and then you get you get to see the pub highlighted so I can I can I've checked all these and I can scroll down to the bottom here you've got the Brussels Stock Exchange and it it knows from the categories on Wikipedia that the Brussels Stock Exchange is defunct like it's in the defunct Stock Exchange's category so it's like maybe this isn't a good match because maybe the stock exchange doesn't exist anymore and actually let's if I click on show on map you can see it's highlighted the building like that's the match it's found and I've got two pins here which are both the brussels stock exchange there's loss of brussels and there's brussels stock exchange so so what's going on there is that there's two items within wiki data that represent the stock exchange and one of them represents the building and the other one represents the institution and their but they both have coordinates and they've both matched so the system doesn't know which two to use and so it gives you like an error it's got OSM accordin matches multiple items and it'll cross like so let's if i scroll to the bottom and then i click add then this is the the confirmation page and you've got this warning here suggesting you talk to your local mapping community but I'm just gonna hit save and it's using WebSockets and it's going through and it's saving so this is editing OpenStreetMap and it's it's edited OpenStreetMap and then I can say view your change set and you can see
I've I've if I scroll down just here I've edited all of these things and added wiki data tags so that is my talk are there any questions [Applause] the my software doesn't consider the Wikipedia tag I think quite a lot of the Wikipedia tags are wrong like there's a lot more wiki data items than there are Wikipedia articles and so the wiki data tag can be a lot more precise like you might find a seaside resort has a beach and the beach is referred to in the article and so people link the beach to the seaside resort but you know there might be a wiki data item that just represents the beach so you could do a more precise link that way and does anything use it the OpenStreetMap web interface is using it well it understands it and links through and the the OpenStreetMap editor that's on the website that ID editor understands wiki data and will query wiki data and pull the title from wiki data I've actually got an example if I can figure out how to here we go well the so you're asking
about the wiki data tag if you this
example Maison du Roi you know is a building in the Grand Place but if you look at the Wikipedia tag it says you know this is the Grand Place article and Dutch on the so this is wrong this Wikipedia tag like now I'm going to add the correct wiki data tag maybe the software should be taking out this Wikipedia tag at the same time or correcting it I don't I don't know yeah yeah it should it should but I haven't like written any code to handle Wikipedia tags at all and you know I need to do that I did not so the possibility to check the accuracy of a christening so I've
got something yes that's that's a good point like here I've got already tagged this
mmm just let it load so yeah it'll show you a list of things already tagged and it'll say whether my suggestion matches what what is there already here we go so that one's not a great one but yeah these these are all matching it's a bit unsure about the central station so yeah there's some something there too to do that yeah I just wanted to ask what's the status of the key data today funding last year I remember I'm not being paid this is just for fun and I don't really have any kind of official connection to Wicca data yeah like if you just search for your local area and have a look you know try that or you can browse so you might pick you know your country and then zoom in a bit the one one of the pieces that I'm missing is like keeping track of progress like I should be able to say well you know Brussels is 100 percent done or the browser screen should have kind of percentages next to each sub-region so you know there isn't a good way to figure out like at the moment yeah this is good idea the other option would be like it might show you one match at a time and say is this good or bad and you hit save and like I could be sending something back to the server and storing it so there's there's ways of doing it like I've just been avoiding the problem by working on smaller areas like not not trying to do massive areas at once like the other thing is how the change set looks to other people who come and have a look at OpenStreetMap like if you try and do a whole country it'll be very overwhelming for someone to try and look at your work so that's like a nice reason for using smaller areas [Music] it might suggest to everyone if you want to see an example of unstable identify us not one it's got pretty interesting history it doesn't work very well with linear features so host this the tool that I'm using doesn't seem to load rivers I don't think it does streams and canals but again all very well like the canals are often represented as a series of ways and like when I built this I was very keen to have a one-to-one mapping between OpenStreetMap and wiki data and there isn't a one-to-one mapping like Open Street Map tends to have you know separate like represented Road as a series of ways and you know they're the same Road because they've got the same reference or the same name so I get into difficulties with bridges and tunnels because in OpenStreetMap bridges and tunnels tend to be represented as two ways like for in either direction for a road bridge or a rail bridge and so if I want to add the wiki data tag I need to add it to both ways and I don't support that like the system will say I found two matching things so yeah yeah yeah stuff like that the bridges had there's another tag that man-made equals bridge which is supposed to be a polygon drawn around the bridge and I've got some special case code that detects that and says oh I'll use that one and it ignores the others but tunnel doesn't have something like that tunnels if it's a tube or tunnel it's always represented as two lines on an Open Street Map and I need to change my code or convince OpenStreetMap that there should be an object that uniquely represents the tunnel yeah yeah the it works with the hiking routes like hiking routes is a linear feature and is a relation and it matches those up does relations yeah those does all three types of objects I don't go near that I don't touch those that that's for other people like maybe sometime the Brants is complicated because like I have a problem with banks when you try and do a city where a bank has an office I often matcha nearby branch things like that like you know it doesn't handle and and libraries get confusing because you know the the main library and branches the there's a lot of libraries in Wicca data someone is been loading all the libraries into wiki data but it's tricky add a wiki data a link between our wiki data property oh I suit mean like the identifiers like yeah there's a config file well there's a bit of code that is just like a mapping between them yeah it's it's it's not difficult at all maybe one day it'll move to the database and you can just click a button to do it but at the moment it's edit the code yeah like because they're too trivial and the under the US property intellectual property law to be protected like you know if you tell me the cordon something you can't claim any kind of intellectual property on that I mean if for a single coordinate it's the same in in in the EU but for a database of coordinates that is like you know a lot of work you've done collecting a database and so the EU says that that's protected but in the u.s. no it's not it's not protected yeah I mean people can use the tool the file bugs like I've got bug tracking on OpenStreetMap on github that's it I think I mean if anyone wants to contribute like I'm the only developer at a moment the code is kind of tricky to install that there is ansible playbook for installing it but there's a lot of moving parts like you know it does a does lots of bits to it to try and get it working yeah yeah yeah good question well like I say the code is out there so someone could take it over like yeah I don't I don't have an answer really to how to make it more sustainable [Applause]
Loading...
Feedback

Timings

  503 ms - page object

Version

AV-Portal 3.21.3 (19e43a18c8aa08bcbdf3e35b975c18acb737c630)
hidden