Addresses 101
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 70 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/58535 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
State of the Map US 20195 / 70
5
7
14
20
26
28
30
31
33
36
38
40
43
44
51
53
55
56
60
61
67
68
69
Transcript: English(auto-generated)
00:00
Good afternoon. I hope everyone can hear me. I'm Wade Crump. I'm an OpenStreetMap volunteer. And my presentation today is addresses 101. So about this presentation, there's way too much information for 20 minutes. The entire presentation was going to be on GitHub.
00:20
I forgot everything I ever knew about GitHub in the past few weeks. Addresses are rather complicated. There's a lot of special cases that I won't get into. But I'll try to throw some out there. And the usual disclaimer, the opinions are mine, not my employer's. Okay, some basics. You know, what is an address? Its attributes used to designate a location.
00:43
We're pretty much in 200 Oak Street Southeast. So we went on just over that way. Minneapolis, Minnesota, 55455. OSM uses the de facto Karlsruhe schema, which is the key adder and various sub keys.
01:00
And the wiki is really the place to go to find out more information on how they want things done. And of course, this presentation is entirely US focused. So Costa Rica, sorry, guys. What are the parts of an address? You get the house number, the street name, the secondary address, unit designator. Thank you, Postal Service, the city, the state and the zip.
01:23
And how do we do that in OpenStreetMap? Well, here's the keys we use. We use house number, street, unit, city and state and postcode. And we're a little schizophrenic. Street is not abbreviated and it's mixed case.
01:42
Minneapolis, the city is mixed case. But the state is all uppercase because we're following the USPS convention. More on address components, the house number. For example, if you look at the little picture in the bottom right here, 6455-45.
02:02
Well, what does the dash mean? Anything you want, apparently. In this case, it's a sweet number. In Columbia, Maryland, which is near where I live, in New York City, it's a single house number. And it is no longer designated by the Postal Service, but a lot of people still use them.
02:23
In Lutherville, not far from my office, there's a 2-4. And that's actually two shops adjacent to each other. And the guy said, well, it's both of them, so I put them together. And then in Maryland, in OpenStreetMap, one mapper has been using it as a shortcut. So in this case, it's 28 houses is indicated by the dash.
02:44
And that is clearly, in my opinion, wrong. Some other weird house numbers. Wisconsin has a grid-based system, which is north and west of City Center, which makes really horrible addresses. I hope they clean them up someday back to normal things.
03:03
And then one of the stranger things I found is leading zeros are generally a computer formatting error. But in Portland, Oregon, the leading zero is significant. And thank God this is going away in 2020, because that is insane.
03:22
The street name, again, mixed case, spell out the abbreviation. So street, southeast are both spelled out. And we're supposed to be using the posted name on the street. But, you know, the signs don't always match, and the local government and the Postal Service don't always agree. They should.
03:41
The city is the postal city, not the legal city. And the USPS has, you know, one preferred name per zip code. They also have alternative names, abbreviations, and even discouraged names. But we only use the preferred name.
04:02
Again, Adder State is abbreviated to two letters, and Mexico and Canada also have two letter state and province abbreviations. Zip code, or post code, as we say, it's not keyed as Adder zip or zip code. It is Adder post code.
04:22
And a zip code is always five digits. So if it's needed, it's zero padded. So 00907 is correct. Now, some people have been capturing zip plus fours. I don't see the need for this because those are used in the mailing industry. And in order for you to get a discount on your bulk mail,
04:44
which is basically who uses zip plus four, you have to get them rechecked every six months, which is not something that I think anyone in this room is going to do. So, notes on addresses. Addresses are given out before the building exists.
05:01
Check out my lovely address here. This is a one that I found in the field. And no, that is not the correct spelling of Sanford. So, you know, over time, a zip code, a street name, or even a house number can change, and a building can be demolished or a firm can close.
05:21
But I tend to leave the address in place unless the building is never going to be replaced. Some more notes on addresses. A little graphic there. That is a utility box for the phone company. So utility boxes, substations, and other uninhabited locations
05:41
can have an address, and an address can cover multiple buildings, which I really hate. And people have been putting addresses on golf courses and parks. And while that might technically be correct, I prefer the address to be on a main building so that it works properly for routing.
06:00
If you go to the center of a park, that may or may not be anywhere near the road that you want to come in on. And addresses should be unique. Of course, life is messy, so duplicates happen. And if suite numbers exist, especially for shops and offices, definitely want to use the suite in the adder colon unit.
06:22
And I don't model apartments. It just seems like too much work. Puerto Rico is very similar to the U.S., except Puerto Rico may use urbanization. Their city doesn't actually capture the Spanish address scheme quite right,
06:44
so they have to have a thing between the city and the smaller-than-city called an urbanization, like Urbanization Terranova. And that is placed in outer place, which is, interestingly enough, not in the documentation of the wiki. And then Puerto Rico has addresses
07:02
not too dissimilar to the Costa Rica we heard about earlier today, where it might be a kilometer post distance from town. They also have condos without street names or street numbers. And you can see the USPS webpage for way too many details.
07:22
I'll briefly trip into Canada. And by the way, that's Puerto Rico, and that's Canada. I'm only going to talk about places I've actually been. In Canada, Adder State is turned into Adder Province, and they use a two-letter province abbreviation.
07:44
Their postal code is technically the forward-certation area local delivery unit, sing along with me, F-S-A-L-D-U. And it is six letters and numbers with a space in the middle, which is a great system. It basically packs a zip plus four in a very easy-to-use space.
08:03
They don't use O and I. And Santa has its own postal code, which is H0, H0, H0, or ho, ho, ho. Okay, optional parts of the address. Interestingly enough, Adder State, or Adder Province in Canada, and Adder Country are considered optional.
08:23
And the reason this is is because people think that you can take a boundary relation, aka a polygon, and do a spatial overlay. Not a big fan of this, but whatever. So what gets tagged as an address?
08:41
Pretty much everything. You can have nodes, nodes within buildings, or multiple nodes within buildings. Or you can have a closed way or other addressed thing. So, you know, do you put a node in a building, or is it better to have a building with an address? Both are very acceptable.
09:01
I prefer a building with an address, so putting it on the way. But that is a preference thing. And then if you have a building with holes, you can have building, sorry, you can have address information on the relation. And why are addresses important? Well, three things.
09:20
Geocoding, so that's, you know, assigning a location to, I'm sorry, assigning a coordinate to an address. Routing, so two addresses, how do you get from A to B? And my personal favorite, quality assurance. There's a lot of things you can do when you start comparing two sets of data.
09:41
Basically the streets and the addresses, you look for mismatches. So I'll talk briefly about editors. Mobile editing, I've heard OSM and is very good. I only use GOMAP. And here's a screen from GOMAP. And I've customized unit into my GOMAP so that I get to add suites as I go along.
10:06
ID editor has a wonderful, clean address entry box. And it takes suggestions off of nearby address parts. JASM has a wonderful address tool, shortcut K. And it allows you to pick something, drop an address, pick the next thing,
10:26
and it auto increments or auto decrements by two, which makes life very easy. So you can just go down the street and pick things and keep addressing them. Another tool that they have, this is actually a plug in, is the terracer,
10:41
which is for those of you who don't speak British, a row house or a town home. So I start with a rectangle. It is a perfect rectangle. Anything else will make really bad geography. You enter the lowest number and the highest number or the number of houses in that block.
11:04
And it splits it into five pieces. And it's wonderful. Interpolation lines are an advanced topic. These are a way to put in things that really haven't been, aren't showing up on the imagery.
11:22
So this is what I put in for a set of row homes in Ellicott City, which is where I live. And basically you're connecting a low and a high address. And then you're saying that between the low and the high, these are all evens or all odds. And when I finally get some imagery, I will replace this with an actual set of town homes.
11:47
And these are very popular in Canada for reasons that they probably pulled them from their census data. So how do I capture addresses? I use a GoPro, which has a GPS.
12:01
This is a GoPro image. As you zoom in, you can see that the image is very strong. And usually I started with the mailbox. In this case, I'm out of luck. Luckily, they have a security firm that has put all of these nice little numbers on them. And if that doesn't work, you can't see it.
12:21
But behind that purple circle is the house number on the house. Now, the biggest problem with field capture is that you can't always be guaranteed that you're going to, you know, find a number. And I'll get to that in a minute. Then when I get back home, I load these photos into JASM.
12:44
And then you can see I've got a picture here of 6058. The little red camera there is the location that that photo was taken. And then you can see I've now used 6058. OK, so problems with field work.
13:02
It's costly. It takes time. It takes money. You get blurry images, overexposed images, obstructed views. You get people parking their cars in front of the mailbox because the post office guy has already been there for the day. Bad weather, gear failures, you run out of battery, gated neighborhoods.
13:21
And of course, the missing, damaged, faded numbers or the numbers that have peeled off. In this case, we have a stealth mailbox that is entirely black. I have photos of all three sides. All three sides are black. So what do you have to do at that point? Well, imports. Imports are ugly. Imports are difficult, et cetera.
13:43
USA is way too large. There's not enough mappers out there to do it all by pedaling up and down all the streets. So imports from local government data, local county data is really the way to go or state data. They need a lot of preparation. You have to get basically the soft part is license compatibility.
14:03
You have to have a wiki page. You have to have discussion on mailing list, et cetera. And then the hard stuff is you have to reformat all of that data into from their GIS format into OpenStreetMap format. And what goes wrong?
14:22
Well, some typical errors. They don't expand the street name. RD should be road. They don't mix case the street name. So Bull Street, all in uppercase. Using improper key. Well, zip code is nice, but the proper key is outer postcode.
14:44
Weird formatting things like Spring Parkway Court. Well, clearly they got the court right, but they didn't go back and think of Parkway because their parser was written not to do that. And then in the QA section address points that don't have a matching street.
15:02
So if you put down addresses, you can link them up to their streets and then you can see what falls out. And the question quickly becomes, you know, does the street is missing a name or is the name spelled wrong? If they don't match, who's wrong? Is it the street? Is it the imported data?
15:23
Is it both or is it a false positive where you could have both of them match and both be wrong? So once you have done all this, wherever you get your addresses from, there's a bunch of wonderful quality assurance tools. And I'm going to talk about first my favorite, which is QA by Simon Pohl.
15:44
It's very simple. It updates every five minutes. And he puts a red line around a residential area that has no addresses. There's red buildings for houses with no addresses and green for houses with addresses.
16:02
Lovely clean interface. And the no name there is on highways. OSM Inspector, I very much like this tool. This is a very good tool. It does daily updates and it has 17 different settings that you can use to show errors and correct data.
16:24
The first thing I'm going to show off is the red dots, which in this case are street not found. And we have two problems here. One of them is that the streets, the center lines, the ways don't have any street names. If they had street names, the second problem would be that half of these things are uppercase and we're matching on mixed case.
16:49
So Quayside Court should be mixed case. OSM Inspector, another another picture. This is showing purple dots.
17:01
The purple dots are addresses with just a bare house number. No street name. Not very useful. The red lines are unclosed ways. In this case, they are address tags on highways. We don't address highways. We address buildings or points or relations.
17:21
Another OSM Inspector view. This time you'll notice all of the houses have an address. The little black lines are pointing to their closest street name. They're all coming to a point. That's wrong. What's happening there is the street names have been flipped in TIGER.
17:43
Once I looked at TIGER and got some mapillary imagery, I was able to fix this up. Brief segue into TIGER. TIGER was a wonderful thing. People hate it. Both are true. There's lots of errors in it. There's, you know, street names where they were off by a street.
18:05
So the block, you know, ABC, they got it. Nothing. AB. They missed C. The TIGER import was considered a very bad idea. I think it was the right thing to do. And the reason I say this is because all of the major map companies did exactly this when they started.
18:26
And, you know, we still have many years of cleanup to go, but use enough QA tools and you'll find stuff. JASM has a validator, so when you try to check stuff in, it will tell you that you have duplicate addresses.
18:42
And this occasionally tells me that I've messed up. It also has a way to select incomplete addresses where there's no street match. Okay. For some really deep stuff, regular expressions for validation. This could be a whole topic in itself and everybody would fall asleep.
19:00
Basically, these are really wild ways to check that something is matching a pattern. So if you're familiar with, in an operating system, you want all these files and then you put a star at the end of it. Very similar to that only on steroids. So this allows you to basically match things and see if they are or aren't correct.
19:26
So, for example, that's an FSA LDU Canada postcode. If it doesn't get through that, it's wrong. Those are the rules. They're just those are the rules. And it can be used in JASM, QGIS, and of course, Perl, Python, et cetera.
19:44
In JASM, with the expert mode on, regular expressions checked, ignore case unchecked. You can use regular expressions. This is my regular expression for finding street names that are all caps. And I also noticed that I was getting a lot of eighth street where the T was capped and that should be lower cased.
20:09
So, you know, this is a good way to find a lot of stuff and fix a lot of stuff. JASM is a very powerful editor, so it allows me to quickly get a whole bunch of these and fix them. A little bit about extraction and analysis. Five minutes.
20:24
Getting data out. I tend to get data out via the Geofabric website. They have state based extractions. ExtractBBBike.org also allows you to have your own, define your own area,
20:40
which is nice if Geofabric doesn't have the area you're particularly interested in. And then I have an OgreScript written in Perl that pulls these all out and someday I'll put it on the GitHub website as well. So what I do with and I extract stuff is in my day job, I actually have been trying to use some of this data.
21:05
So if the outer state is null, then I overlay polygons and of course, if you're extracting stuff out, expect odd data. States in the city field, suites in the house number or the street name. People don't always know where to put things. So if it's not clear, sometimes they will just drop it anywhere.
21:24
And of course, if you're extracting stuff out, an address can become a GIS point or a line or a polygon. So you may have to do post-processing. In QGIS, you can also do regular expression matches.
21:43
And I've been writing some Python scripts to do OSM to county data comparisons to find missing addresses as well as duplicated addresses. And I'm doing other stuff as well. And now we have very little time left. So I looked at all of the U.S. states, D.C. and Puerto Rico.
22:04
And just raw address counts. They followed pretty closely to the population. So New York and Massachusetts have the most addresses. Down at the bottom, Hawaii and Montana. That's pretty much expected. I then looked at address completeness. So does the address have a street, a city and a postal?
22:27
And Arkansas was the absolute best. The worst, interestingly enough, was South Dakota. And most of that would be because there was just a bare house number, no street name.
22:42
I also looked at formatting errors. So, you know, are the street, the city, the state and the postal formatted correctly? If they're not null. And again, the best was Arkansas. And the worst was Utah. But I have not really researched why that is. Turning to Maryland, which is where I'm from, we've done some imports on counties.
23:06
So as you can see, I compared the open street map to state government data. And the imports are both in the 90s. And then way down at the bottom here is Howard County, where I live. And we're at a lowly 14 percent overall.
23:24
The state of Maryland is just under 34 percent addresses covered. And a random warning, because I'm almost out of time. Many presentation packages will fix your quotes for you, will make them pretty and will destroy your code.
23:41
So don't do that. The end. Questions? No questions. If your street sign says S-E-C-O-N-D-3. Yes.
24:01
You've put two. I've been putting two. The Postal Service seems to like the number two versus the spelled out version. Yes. Yes.
24:28
I don't know, and I'm not a big fan of the relation because that is not how most people put in addresses. In this case, what she's talking about is you instead of putting all of the address information,
24:43
the street name on the address, you just provide a link to the street. But that is not widely adopted. And I don't know if many tolls use that. And I certainly have not looked at those. Any more questions? That's a good question.
25:37
Yes, I've been attempting to use this for my day job.
25:41
But, you know, if Maryland is at 34 percent and we're if we're anywhere typical, then there's so few addresses that at this point I don't think a lot of people are using them, but they are used in the geocoders and the geocoders will take this before it will take a tiger address.
26:03
So, yes, these are being used. So the postal city. Yes. The city instead of the legal entity. Yes. We had a big discussion in Massachusetts about that and nobody could agree, especially since we have some villages.
26:21
Right. That are not real villages, but everybody knows they live in those villages. But what what city does does the mail come to?
26:48
If you addressed it, if you addressed it to Barnstable, would the Postal Service still deliver? But it's the residents who say, I want to go visit my friend in Hyannis. Right.
27:02
They don't even know it's Barnstable. Understandable. Is there anything like alternative names? I have not seen anyone using alternative cities. I've seen alternative cities on the city itself.
27:22
So you could build a concordance between the city points and the addresses so that you could dereference that. But zip codes tend to do that for you.
27:41
Apartments are usually flats. And then you have all string of numbers. You've used semicolon limiters. Oh, I use commas. It works. You semicolon, which they say you should use.
28:02
Interesting. Yeah, I don't map that level of detail. I think I'm out of time. Well, thank you.