We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

The Daylight Map Distribution

00:00

Formale Metadaten

Titel
The Daylight Map Distribution
Serientitel
Anzahl der Teile
26
Autor
Lizenz
CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
A Mapping USA (Spring 2021) presentation by Jacob Wasserman. More information about Mapping USA: https://wiki.openstreetmap.org/wiki/United_States/Events/Mapping_USA Learn more and support OpenStreetMap US at https://www.openstreetmap.us/.
Ray tracingRechnernetzWiderspruchsfreiheitProdukt <Mathematik>ElementargeometrieRelation <Informatik>Endliche ModelltheorieGanze FunktionTotal <Mathematik>Gebäude <Mathematik>Service providerPolygonTeilmengeFacebookVersionsverwaltungProjektive EbeneMapping <Computergraphik>MAPSommerzeitPolygonFacebookTeilmengeMathematikSelbstrepräsentationZahlenbereichWort <Informatik>Produkt <Mathematik>FlächentheorieNebenbedingungMultiplikationsoperatorPhysikalisches SystemFehlermeldungVersionsverwaltungTotal <Mathematik>Gebäude <Mathematik>SoftwareDistributionenraumRelativitätstheorieSoftware EngineeringAggregatzustandExistenzsatzMultiplikationRandomisierungEndliche ModelltheorieEinfacher RingE-MailElementargeometrieDifferenteVarietät <Mathematik>TypentheorieGruppenoperationSchnittmengeElektronische PublikationOffene MengeSatellitensystemStrömungswiderstandSchnitt <Mathematik>WasserdampftafelDifferenz <Mathematik>ZweiEinfache GenauigkeitRückkopplungDienst <Informatik>Reelle ZahlQuick-SortFahne <Mathematik>LoopXML
Transkript: Englisch(automatisch erzeugt)
All right. Thanks, everybody. I'm really excited to be talking here today. My name is Jake Wasserman. I'm a software engineer at Facebook, and I'll be talking about a project we call the Daylight Map Distribution. So our goal at Facebook, we have chosen OpenStreetMap
to serve to the maps product surfaces and essentially all Facebook products. And so our goal is to serve OSM safely to those users. And what essentially I mean by that is that we need an OSM planet dataset with no harmful edits. And I intentionally put this word issues here in scare quotes. We need to minimize the number of these issues, things like vandalism and graffiti,
maybe some not quite intentional things like multi-polygon relation geometry errors. We need to make sure the road network is consistent, things like coastlines. There's tons of other checks that we are putting into the system. I don't have a lot of time to talk about all of them. But the main thing, another constraint that we had is that we want to serve 100% OpenStreetMap data.
And what I mean by that is that there are no forks. So if we find any issues or anything on the map, we make sure that all fixes are submitted directly to OSM. I just want to give kind of a very quick overview. I get asked a lot, like, how does Daylight differ from OpenStreetMap.org or just taking a PDF off of the website?
So this is just kind of a notional representation of the OSM history. We've got nodes here, ways, and over time, people are making these change sets and you have these different versions at kind of any given time. So they're like the minutely diffs coming in or you just update every day. You can imagine at time T0, you just take a data cut,
you get the latest version of every single feature that's on the map. So version three for the first node, version one for the second. You can imagine if you take that time cut, you start looking at the planet, you say, what's wrong here? Let's just say on the second node here on version one, maybe there's a problem, the name is wrong, it's a drag node, there's some weird mistake. So you submit a fix and you have now version two
and you take a new time cut at time T1. And what's great is you now have your fix in there. The real problem and things that can happen is that there's a whole host of other changes that have now made it in since then. You know, all these ways and relations down below have changed too. And so what happens is if there are any new issues on those elements,
now that you've taken them, you have a whole new set of issues, you have to fix those and you kind of end up in this kind of loop where you're fixing things, you submit a fix to OSM, you find it, you take a new data cut and now you have a whole host of other issues. Daylight is essentially our solution to that problem. Daylight essentially gets rid of the notion of taking a single time cut.
And what we can do is sort of cherry pick the best version of all the nodes, ways and relations and minimize the total number of issues that you might care about over the entire planet. And when I use the word safely here, what I mean is things like referential integrity, you're not going to have a way that is referencing nodes that no longer exist.
And we're not going to end up with things in weird states where, you know, the road network is broken because we weren't really careful about which versions of things were essentially plucking or cherry picking. I just want to talk about a few of the important things that we look at and make sure that at the end of the day, the daylight map distribution does not contain. So things like name vandalism, we've actually spent a lot of time
leveraging a custom vandalism and profanity model that detects bad names. Probably a lot of people are familiar with New York City being renamed to Jutropolis. Here's just a couple of other random examples here of like vandalism on an alley name, people changing the name of the lake. You know, as we talk about cherry picking or plucking out the right versions, we make sure none of this is true on essentially every name tag across the entire planet.
Another one, this is my favorite thing to talk about because I think it's one of the most underrated but very common issues in OpenStreetMap data is things like multi polygon relations that are broken. So if you have something like an unclosed polygon ring or self intersection, a lot of things cannot build a valid polygon in your data set.
And you might build a planet and find there's large lakes, large rivers that are essentially just not able to build and they're just land on your map. And, you know, for us as Facebook serving that to a global audience, you know, if people see that, they're just going to really lose a lot of trust in the type of data that we're, you know, serving to everybody.
So we make sure essentially all of these and a variety of other things I also listed out coastlines here are clean and valid across the entire planet. Essentially, that's all I have here in the five minutes. I really wish I could talk a lot more about this. But, you know, just to summarize, Daylight Map essentially is cherry picking these different versions of nodes, ways and relations to minimize some number of issues that we consider to be safe to serve to a large number of users.
And the key thing here is it's totally public. We have a website, daylightmap.org. You can just go and download this planet PBF. We release it to S3 essentially every four weeks. There's a variety of other data products on there. There's some really interesting things with like conflated buildings
derived from satellite imagery. So we find buildings in the satellite imagery that are not an open street map and offer that as an OSC file that you can essentially just apply over that exact planet. We also apply things like or supply things like coastline land and water polygons built for each daylight map. And, you know, I really just want to reiterate it is 100 percent open street map data.
Every single thing in the PBF has an entry in the open street map history. You know, it's not 100 percent of OSM because we only consider a subset of the things generally that we found are important to a lot of users. But it is 100 percent OSM. And, you know, again, the data is not from a single time cut. One last thing, you know, I want to say the main takeaway is that the number of flag issues, the number of things that you might detect,
issues, errors, whatever, linters, whatever you want to call them, the number of those in a daylight release is typically far lower than any single time cut you can take from OSM.org. And that's really what I think the main value of that is. And, you know, we offer that for people to serve to their users. If you're looking for a safe and clean, validated dataset is literally
the same thing that we push out to Facebook production map services. So with that, if you have any feedback or if you want to participate, if there's something here that, you know, some checks that you think are really important that maybe we're not considering, we would love to work with you and love to hear any feedback you have. We have a Slack channel here for the OSM US Slack group.
You can email OSM at FB dot com. And of course, I have to mention, we are hiring. So if you're a software engineer and you love working with OpenStreetMap or Maps data or choose spatial databases, anything like that, definitely reach out. Here's my email. It's jwasserman at FB dot com. And with that, I will yield the floor.
Thank you very much.