Opening Address Data around the World

Video thumbnail (Frame 0) Video thumbnail (Frame 2726) Video thumbnail (Frame 3254) Video thumbnail (Frame 4286) Video thumbnail (Frame 5574) Video thumbnail (Frame 6000) Video thumbnail (Frame 7363) Video thumbnail (Frame 9567) Video thumbnail (Frame 11233) Video thumbnail (Frame 19801) Video thumbnail (Frame 22870) Video thumbnail (Frame 25013) Video thumbnail (Frame 26370) Video thumbnail (Frame 27462) Video thumbnail (Frame 30238)
Video in TIB AV-Portal: Opening Address Data around the World

Formal Metadata

Opening Address Data around the World
Title of Series
CC Attribution - NonCommercial - ShareAlike 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Release Date
Production Year
Production Place
Seoul, South Korea

Content Metadata

Subject Area
With over 110 million points, has grown to be the largest open database of address data in the world. Governments, developers and businesses are realizing that address data belongs in a commons where it can be easily maintained, used by all, and drive economic growth. These early efforts are now powering some of the world's best commercial geocoding systems, as well as crucial infrastructure like emergency responders. But there's more work to do. We need to reform outdated laws, expand coverage to new cultural contexts, untangle shortsighted licenses, and invent new modes of collaboration between the public and government. We'll cover how OpenAddresses started, how it can be used today, and how we expect it to grow into a definitive global resource.
Point (geometry) Functional (mathematics) Service (economics) Information Mapping Source code Projective plane Open set Food energy Power (physics) Web 2.0 Goodness of fit Computer animation Personal digital assistant Operator (mathematics) Mixed reality Cuboid Speicheradresse Speicheradresse
Latent heat Uniform resource locator Computer animation Open source Software Open set Mereology Approximation Position operator Speicheradresse Physical system
Point (geometry) Building Service (economics) Computer animation Visualization (computer graphics) Vector space File format Tesselation Open set Speicheradresse
Point (geometry) Web page Interpolation Focus (optics) Dependent and independent variables State of matter Multiplication sign Projective plane Source code Open set Graph coloring Number Medical imaging Goodness of fit Computer animation Core dump Order (biology) Energy level Arithmetic progression Traffic reporting Speicheradresse Scalable Coherent Interface
Point (geometry) Trail Service (economics) Direction (geometry) 1 (number) Online help Public domain Open set Attribute grammar Fraction (mathematics) Centralizer and normalizer Natural number Authorization Software testing Extension (kinesiology) Address space Personal identification number Projective plane Physical law Bit Set (mathematics) Line (geometry) Instance (computer science) Equivalence relation Process (computing) Computer animation Personal digital assistant Self-organization Right angle Resultant Speicheradresse
Complex (psychology) Coefficient of variation Code State of matter Multiplication sign Outlier Source code Function (mathematics) Open set 8 (number) Mereology Traverse (surveying) Formal language Web 2.0 Programmer (hardware) Web service Different (Kate Ryan album) Hypermedia Pattern language Automation Office suite Social class Covering space Scripting language Electric generator Touchscreen Mapping Block (periodic table) File format Bit Instance (computer science) Regulärer Ausdruck <Textverarbeitung> Statistics Angle output Website Pattern language Quicksort Figurate number Initial value problem Speicheradresse Point (geometry) Beat (acoustics) Functional (mathematics) Server (computing) Implementation Service (economics) Computer file Link (knot theory) Transformation (genetics) Online help Continuous integration Metadata Field (computer science) Computer icon Attribute grammar Number Power (physics) Latent heat String (computer science) Cantor set Authorization Analytic continuation Traffic reporting Address space Form (programming) Standard deviation Multiplication Scaling (geometry) Information Forcing (mathematics) Projective plane Physical law Mathematical analysis Usability System call File Transfer Protocol Subject indexing Word Computer animation Grand Unified Theory Personal digital assistant Infinite conjugacy class property Object (grammar) Local ring Tunis
Service (economics) Observational study Projective plane Similarity (geometry) Online help Line (geometry) Open set Open set Power (physics) Estimator Process (computing) Computer animation Bit rate Calculation Speicheradresse Traffic reporting Punched card Address space
Covering space Computer animation Transportation theory (mathematics) Multiplication sign Office suite Keilförmige Anordnung Dimensional analysis
Computer animation State of matter Personal digital assistant Variety (linguistics) Multiplication sign Electronic mailing list Instance (computer science) Open set
Slide rule Cellular automaton Multiplication sign Source code Shared memory Database Special unitary group Dressing (medical) Estimator Spring (hydrology) Computer animation Personal digital assistant Network topology Error message Speicheradresse Window
Computer animation Lecture/Conference
great that Moses is the same thing I did morning everyone thank you so much for coming to this session I as our said my some you know map OX primarily energy you couldn't even but both work and engineering and on acquiring the data and 1 of the things that's of particular interest to me and is 1 of the reasons I came that box is trying to make a mix of dated powers are geocoder more open but up and so I won't talk to you about the open addresses project which is an initiative that I was started by in Hindi is about a year and a half ago and which we've been working on pretty hard to try and improve I open addresses is the largest open addressed point dataset in the world but it's growing very rapidly and for reasons that I will run through here and I should note quickly this is the open addresses as an open addresses that I 0 there's no progresses UK recently I suspend its operations also I think it's was project before that but I overdressed that I was the 1 that achieved the the largest set so a quick review of why you want to stay I've already mentioned that geocoding is the use case as you want translating coordinates into human-understandable addresses and vice-versa it's a function that's absolutely essential to any service and it's only getting more important as we rely on nobles and on automated technology more and more but to build a did you could you need a few different layers obviously you need a sense for countries are and obviously there good Open Data sources for that kind information below that we also need this that that you know intra-country region where they're pretty good data sources for this as well the open and freely available to everybody blow that they need a
place some some notion of the inhabited towns and villages and cities this is where things start to get dicey and you can often get this from open sources like a national census Department of postal system or Open Data Portal probability can vary generally speaking no this can be cobbled together from open sources it's
below that things really fall part of when you need to get a specific position of interest monastery how all too often laughter to fall back to data like this this is by no longer a picture France it's actually downtown over from a very expensive geospatial data set as about year-ago value most in a residential streets this is incredibly coarse approximation of the road network really not adequate for finding anything even if you did have the residential streets this would just be guessing at the location of where a particular addresses by saying this line segment starts at 100 and 200 so I'm gonna linearly interpolate along at 4 where I should drop my be
it what you actually want is this this is a
visualization of open addresses coverage as of last week of all 188 million points chopped up in vector tiles on as we zoom in you can see that this is maximally granular data it gets down to the individual buildings that she might want you know this is the kind of data that you're going to need in the future if you want amazon to repeats accurately or if you want deliver high-quality emergency services to people in a city on it is that it is increasingly being collected increasingly open admission of open addresses is to collect this data and normalize it around the world into a common format that can be used without dealing with every individual municipalities forks license considerations and other assorted weirdest how to read a lot of
progress I mentioned a a couple times now that we are closing in on a 200 million addressed point mark this is what the recent data report from our status page which you can find it data . open addresses that I 0 but you
can see the project started in the US Amateur that he induces the fountains has been a geographic focus for the core if contributors US government is arranged in a weird way such that responsibility for addressed points often also to the county level which means that there are a gigantic number of governments we need to interface with in order to collect a comprehensive dataset for the country but it's nice if there is a pretty good interpolation dataset 1st of all background we want points for everything you can see here that there are some stately level datasets have been collected by GIS department we are often going to individual counties I shall image you can safely ignore the colors on this except for grave graves uncovered red vs. green just means the status of the data source on the last 1 of open addresses refection refresh all of points on a nightly basis and although the data sources sometimes down we've cost catch the most successfully retrieved source situation is
a little bit better in Europe where efforts like the inspire directive and more centralized government help to get sets together and unified place you can still see that there are some holes but within countries of course all countries will still want better coverage I should also mention the exact nature of this data and its quality varies a little bit from country country some of these are very high quality emergency service datasets they're designed for users where lies on the line and other ones are a set of points derived from the gospel data that's been collected for tax or other purposes that's still extremely useful but it may not have the pin dropped on top of the top yet but so it does vary somewhat by country even when data coverage is incomplete and can still substantially improve your geocoder if you already have 1 this is an example of how we improved our results using recently Open data from Austria on on the left is the before on the right is the after and the blue pen represents are tested a reference point we used to evaluate how good a job doing you can see here already passing on this result but with the Austrian data we're able to actually dropped at exactly onto the top rather than just interpolating close to it but in some cases the answer and it is actually more accurate and the test data that we collect from 3rd allowed us to have really excellent accuracy even while not using a progresses data exclusively so this is the
point where I talk about the license and I I I often refer to it as CC BY which I'm open addresses is a project that describes where address it exists is not an organization that has the rights that data work licenses we keep track of what those licenses or we work with lawyers in country when it's unclear and do our absolute best to figure out for instance with the Polish law and geography angiitis he says on and we try to normalize these licenses to the greatest extent we can we can make guarantees that is not the end user to figure out what they're going to do with it and and what their liabilities might be with that said the vast majority of data in open addresses is available under an open an attribution license us something like CC BY or the national equivalent to may have been used by a particular authority or it's just public domain but we do indexed some more restrictive licenses but a very small fraction and very soon you'll be able to download for license extracts of the dataset
so let's dive into an example of exactly what the guts of address source look like but this is this is a little bit more technical but it's probably worth talking about what open addresses actually doing most people who bear the website encounter the data as a monolithic CSV file that this is the actual beating heart of the project that's what we often call conform while you can see it's just Jason document this is how we define where data comes from and how it's going be normalized into a common format that can be used across sources of so it may highlight a couple things here up at the top of the the con form object itself a sub-dialogue object that defines a few things I in this case we're looking at a Danish address data source and it's a C is the same point but you can see the full works absolutely the required output fields that depressing of resource we do index some additional fields when available but once we insist on our longitude latitude street name and house number and here we're mapping to the input column names but we outside that object have attribution string our goal for the sources that use attribution license is is to allow users to simply concatenate together the strings so it's very easy to be in compliance with requirements but we provide some metadata help people understand the sources and I don't necessarily recommend building a geocoder without understanding the forks of each national address data source but and the website and note fields and help you understand that and finally up we got a link to a specific license and coverage object which helps us generators maps showed other coverage cover stuff is optional as his license as is most of it's it's really the can't form object that's essentially what you can do more than just that if you got more complex data source for 1 thing re-projected quite often buy into advocated so that out all the data please nicely together but we don't do things like this a regular expression transformation which splits a single field in the house number and streaking fields of this was formerly a collection of different functions that were kind of purpose built a regular expressions are maybe a little bit more intimidating but most programmers are comfortable with them and they offer a lot more power and a unified code to perform all these functions in 1 place you also notice that this is a data source that points to us as a web server that we're able to process the MLC as the Jason as we services shapefiles but pretty much any source that we have encountered were able to to deal with from very large scale custom datasets like malformed GML seems to be a popular standard in Europe at the time but we are writing custom scripts so and this is what this is like contributo progresses actually looks like the project is associated help it's available for anybody to add to if you will know other data so I data source we would love for you to create a pull request which volunteers will review and evaluate for inclusion on pretty generous about that you know if it works at all glad the collection as long as the license looks OK so this is an example of a 4 quest for specific County in the US and that there's just 2 things I want to highlight down there in those the little blocks of words is to checks surpassed those of you who develop undercover probably used to seeing this if use a continuous integration to like Travis we do have a traversal of the checks on the source and make sure that the Jason is well formed that has required fields and that things seem to make sense the really exciting things below that of the open addresses slash hope service to develop by might members formerly same enough good for America is what the court addresses contributors on and it's an automated Web service that while perhaps not the loveliest thing does make contributing to open addresses is much easier that was before if you create a poor quest perhaps just using a copy instance of an existing Jason source file but it will be processed by our servers pulling down at the data here into 1st a along while showing exactly what's going on so you can figure out what might be if the Abacha implementation of it's working out without setting up the entire toolchain 1 2nd into some example data which can help you debug these problems identify column names do all that kind stuff on entered into the actual thought data that's extracted itself which is of course extremely useful but and I can show you a gigantic multi-thousand role CSP on the screen so I mentioned that the project is growing fast but is only now what 18 months old but that we've reached 188 million points that's about 300 thousand per day on which is this is pretty good but that's spikiness comes from some some server changes not things actually appearing and disappearing and was sort of a ways to go this is an analysis that I did based on census figures for different geographies that we have coverage for forces the actual number of points the reporting it's a useful metric for figuring out when dataset is unusually spots and why it isn't but it works out to about 2 people per addressed and going by census population but with some outliers we we have data for soul and you can see that there is a red dot someone to highlight them for this talk and have it's a bit more dense than the rest the world of that's a pattern that continuous weeks extenuation coverage so there's a bunch of ways
that you can contribute to open addresses IRI showed they did have angle I can write Jason taken submitted for class I would absolutely love if you are technically inclined and and and you get to see more resources contributed other media most important thing you can do is contribute research and understanding outreach to your local authorities or national authorities about the value of opening address that but we do a ton about which is probably the most important work that we do in appearances just send an e-mail saying hey can you point us toward a dataset that might satisfy this need it's available under a license or researching you know the depths of the May he FTP site to figure out which 1 of zillions of shapefiles contains the points that we need but in other cases we need people to advocate within their geospatial communities for better licensing and lots but some of this stuff is still being sold on a fairly hopeless cost-recovery basis and that we don't think that's the future so that is also just completely unfairly licensed I'm particularly glad to be giving this talk here in Korea because we but like I said collected data for the sole multiple metropolitan area that's available through Open Data Portal on under a CC-BY license according to the the text and the icons and yet as we brought someone in the office to spend about 4 hours making calls to different parts of the cream state eurocracy we're told that their laws prohibiting the export of geospatial information like this but for that reason we not collect information from that user that directly asked website which as far as I can tell contains a nationwide address data so we really need clarity on that but as you can see from the coverage map we're project that's primarily made up of American contributors and we're lacking in the kind of language skills and local experience that will help us attain global coverage so please but if you're interested in opening address data and you from a part of the world and not already covering we really want talk to you finally
I I 1 and I know it just mention of 4 and this is this is the data that will power and unbelievable rate you services I already mentioned some of the the more far reaching kind of examples like drug delivery of it's incredibly valuable to date Denmark is not only a country that's an excellent job after opening its address data that from a technical standpoint it analyze its impact and produced with World Bank report on its value and that this is this is the punch line the 1st 4 years of the project from 2005 2009 when they're opening the data they spend about 2 million euros which is a substantial investment but the calculator return on that was about 60 million euros and the estimates for 2010 which is when the study was conducted what were similar staggering science disparity was an ongoing investment of about 200 thousand euros versus a regular return of 30 million that's incredible ROI about 30 per cent of that cruising government savings to the remaining 70 per cent going to citizens and private industry an opening address data is and what the tremendous way to generate wealth had to make geography more accessible to everybody but that's mission of our project and all welcome your help in making an entry much but the question such
thinking in any questions or comments
so 1 recalcitrant useful government please convinced me to open my to put up so it's interesting in the United States we have recently had a national interest in the summit convened by the public transportation and I guess in 2 days of your uncle like facility that's mostly used to train the people interact ships for the Navy was full of 2 spatial offices and they all want to open this data all really sick burning DVDs for money that doesn't cover the time cost and what they need is more resources to do it so I I think that the wedge that's gonna make this possible these mandates for
intergovernmental sharing our dimension inspired that's a good 1 of the 9 1 1 in the United
States is proven to be very important for sharing this data across agencies on the tide is turning we're seeing open data policies in places like Western Australia but but you're right it's tough if I go to an agency is not ready to do this but that for instance in places like Northern Territory Australia very sparsely populated and essentially a only selling this data to oil and gas extractors how they don't have a strong case for doing it yet they don't want give away the one-in-ten amount going I think it's a matter of time I'm optimistic about it but but it's it's been can take what you have like a list of benefits you the
variety of data so was working fine for me yet so that benefits from the opening yeah so we often make the case using the
Danish example but beyond that you know intergovernmental sharing via a substantial 1 if the cells don't have the data ready to go but it can be hard to make the case that there are also examples like out we don't have to Tasmania and that coverage already told told us know the it is already CC-BY by you can't get it unless you fax check for 200 dollars to us so we did that excessive I would give window progresses now raise it's spring great so lot of this is just shaking trees and getting people to uh to open up stuff that they've already got handy in that they're not keeping that tightly 1 other thing I wanna mention really briefly which I I skipped over was on a license slide but some of you might ask why reactants OpenStreetMap after all you really great when working something like Spanish ask for data with appointed not on rooftops to be able to just go in and drag those markers to the place where the are of the when you encounter an error and the reason for that is the lack of geocoding guidance right now but commercially geocoding against the database licenses but in our estimation of minefields you're potentially risking the need to expose your customers data if you geocode against the deal sources and that's 1 of the main motivations for keeping this project outside of 0 7 but ultimately and very optimistic that the clarification of that license guidance and that we can start looking at imports the lecture make this data available only has about 55 million addresses right now so open dresses plotted a couple times already any
other questions and comments OK I think they're going to match