New ML Datasets in Daylight Release
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 41 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/58215 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
State of the Map US 202223 / 41
1
4
7
8
14
18
21
22
24
27
28
30
37
39
00:00
Product (business)Combinational logicSelf-organizationProduktraumGoodness of fitUniqueness quantificationCollaborationismBuildingDistanceGraph (mathematics)SatelliteAreaInferenceLevel (video gaming)MereologyAuditory maskingSoftwareRow (database)Device driverCalculationSource codeFerry CorstenMeta elementPoint (geometry)Daylight saving timePresentation of a groupNetwork topologyDifferent (Kate Ryan album)Power (physics)ResultantString (computer science)AverageVideoconferencingFile formatMapping2 (number)MathematicsConfidence intervalResource allocationRoutingProcess (computing)Line (geometry)Direction (geometry)Noise (electronics)Online helpRight angleTesselationMedical imagingKeyboard shortcutService (economics)NumberData structureLocal ringAlgorithmKey (cryptography)Decision theoryComplete metric spaceGeometrySet (mathematics)Curvilinear coordinatesPopulation densityRaw image formatRegular graphMultiplicationInformationTrailWebsiteFeedbackNeighbourhood (graph theory)Transportation theory (mathematics)Endliche ModelltheorieVolumenvisualisierungConnected spaceAnalytic continuationCore dumpPlanningWater vaporSpacetimeStreaming mediaCondition numberSlide ruleMobile appVariety (linguistics)Field (computer science)Range (statistics)Multiplication signInterpreter (computing)Group actionStrategy gamePixelThresholding (image processing)Computer animation
01:03
PredictionAverageMereologyPower (physics)Point (geometry)Resource allocationMappingDevice driverCalculationMultiplication signMobile appRoutingDistanceService (economics)Level (video gaming)VideoconferencingComputer animation
02:25
FeedbackNetwork topologyGeometryMultiplicationConnected spacePopulation densityInformationBuildingRoutingSatelliteMathematics2 (number)Medical imagingSoftwareComplete metric spaceKey (cryptography)Transportation theory (mathematics)AreaNumberLocal ringComputer animation
05:19
Variety (linguistics)Core dumpDenial-of-service attackDevice driverLevel (video gaming)VideoconferencingRange (statistics)Streaming mediaSoftwareComputer animation
05:58
Ring (mathematics)Process (computing)Auditory maskingInferenceLevel (video gaming)Confidence intervalSatelliteSoftwarePoint (geometry)Source codeSlide ruleThresholding (image processing)Meta elementGraph (mathematics)Right angleEndliche ModelltheoriePresentation of a groupMereologySet (mathematics)PixelComputer animation
08:12
Process (computing)Ferry CorstenStructural loadRoutingWater vaporBuildingSatelliteFile formatPoint (geometry)AreaResultantData structureNoise (electronics)Decision theoryTrailSet (mathematics)Network topologyMathematicsPopulation densityLocal ringProcess (computing)Different (Kate Ryan album)Curvilinear coordinatesDirection (geometry)Line (geometry)AlgorithmEndliche ModelltheorieDaylight saving timeNeighbourhood (graph theory)Device driverAverageString (computer science)PlanningComputer animation
Transcript: English(auto-generated)
00:04
Hey everyone, good morning. I'm Vinay from the Grab product team. I'm here to talk about a collaboration between Grab and Meta, where our teams jointly worked on the product space of identifying missing roles in Southeast Asia. So our teams use a combination of data sources,
00:22
AI and technology, confession technology, and I do want to thank Esra, Mark, and the teams from both our organizations for the opportunity to collaborate and on the excellent work to improve the map. Okay, show of hands. How many of you have been to Southeast Asia?
00:41
In the minds of most people, this is how it looks. Sandy beaches, nice sunny skies, endless paddy fields, et cetera, right? But it is a complex and a constantly evolving landscape. It also looks like this. Sea of motorists, unique transportation modes not seen anywhere else in the world,
01:00
impenetrable, road conditions, traffic, et cetera. So before we go any further, I just want to quickly talk about Grab and why maps are important to us. So Grab is basically Southeast Asia's leading super app. We provide ride-heading, food delivery, or financial services in all key markets across the region.
01:21
And Grab is helping make a positive impact to the everyday lives of around 650 million people in that part of the world. We are essentially moving people from, or things from point A to point B. And Grab maps is at the heart of all of this action, right? So as one of the largest geo-service users
01:40
in the region, highly accurate map data is extremely important to our strategy. So better maps equals amazing experiences, right? So from the places that you choose for your rides or your food deliveries, from like say a distance calculation or time calculation
02:00
that powers allocation and pricing, from routes that kind of get passengers and drivers to reach their endpoints faster, or even if you want to kind of search for nearby merchants and like essentials. All of this happens on Grab maps. So at this stage, I'm going to ask you to watch a video
02:22
that kind of gives you a sense of the problems that Grab maps tackles. Sorry, I'm just going to pause this for a second.
02:45
Is that sound? It's on, it's on, it's on. It worked in the other room.
03:01
If that's any consolation. It's not, I don't know if it's plugged in. Should be done HTML or if it's got a NGA, sorry. No worries, no worries, right? But then I think there are like a few things that might have been quite obvious.
03:22
The kind of roads that we have to kind of navigate through, the kind of experiences that we need to help our, right? So all of this essentially happens only when you have like complete road network in place. So road geometry is the bedrock of all of the experiences.
03:40
And this is why missing roads or geometry changes are extremely important to us. Quite simply put, if we can't reach you, then we can't give you all of this. So to answer the missing road problem, Grab has been a major contributor to OpenStreetMap over the years. Since 2018, we made a substantial number of edits
04:01
in the region and we continue to collaborate with local chapters, key community members to constantly enrich the map. Also, every Grab ride automatically makes OSM better. We have multiple pipelines of data flowing in that feed us information on changing ground reality. And we also keep a near route for user feedback
04:21
and that constantly helps us to keep improving the map. But mapping missing roads in Southeast Asia is not so straightforward, right? There are like massive, massive issues that we have kind of encountered. First things, roads change. This is like a rapidly evolving region.
04:41
And when you look at a satellite imagery, it refreshes, it does not always keep pace with the way that things happen on the ground. Secondly, we wanted to improve the coverage where land have like trees or dense buildings. And you can't really make this out when you look at a satellite image. Lastly, as you can see on the right,
05:03
we find roads in unexpected areas which are actually used for transportation. This is a classic and like a common occurrence. And we hence need to improve the connectivity by identifying these narrow or small roads that currently are missing on OpenShift map.
05:21
So we thus needed a different approach to missing roads. One that leveraged one of the core strengths of Grab, which is our driver network and their probe data. So why is Grab's probe data a great data source, right? For starters, it gives us a continuous, low-cost stream of data across a wide range of geographies and vehicle types.
05:42
We've essentially leveraged our drivers to kind of tell us where the map is like and complete. And as you might have seen in the video earlier, we operate a wide variety of vehicles, tuk-tuks, motorbikes in pretty much every nuke and cranny of the entire region. So how does this really work, right?
06:00
Now, we essentially process around 60 TB of data a year. All of the data coming into the pipeline, I must mention, is completely anonymized right up front. We further clean up this data set to improve our signal-to-noise ratio, thereby improving the quality of what we identify on the ground.
06:22
So the key slide is how does all of this happen, right? So we have a three-step process. It starts off with aggregation. So we aggregate all of the GPS data at the geohash-9 level, and we transfer this using a special kind of process into a heat map, like in the top-left picture. The challenge here was to mainly smoothen
06:42
the interpretation of major highways, which are the high-density points and the not-so-used roads, which are the low-density points, and show both of them on the same heat map. The next step was to run a segmentation model which predicts the confidence level of each pixel being a point on the road.
07:02
Everything above a certain threshold is considered a road, and everything that is not isn't, basically, right? And that gives us a binary mask. The third step was to basically skeletonize this, so we transform this binary mask into a graph. We compare the inferred map segments with what is known to be a part of the OpenStreetMap,
07:22
and then we can basically understand which are missing segments. So now we have covered how missing roads are identified from GPS data, but the other mainstream source continues to be satellite imagery, and the two sources have their pros and cons, but together they complement each other really well.
07:42
So at this point, I'm gonna ask Esra from Meta to explain how GPS-based detections can be combined with road inferences from the satellite imagery. Over to you, Esra. Hello, everyone. I'm Esra from Meta.
08:01
I'll continue the presentation by talking about how we merged these two different data sources, GREP data and OSM data, and created the final road network. So the data, this process, big data, that's given to us from GREP comes in line string format.
08:22
And what we do is we use a line-matching-based local conflation methods to decide how to merge these two data sets. So basically we look at each curvilinear line structure and decide on whether to conflate wholly the full line
08:42
or partially based on its local neighborhood. And this data is then converted into a format that is consistent with OSM's ways and nodes format, which creates the OSM change sets finally.
09:03
So one important step here is we also have been able to render these results so that our QA team can take a look and point out any problems, any major problems with the conflation process.
09:22
As a post-processing step, we have one more step here, which is about dropping down the peri routes, the roads that are overlapping with the water bodies or the roads that are overlapping with some existing buildings. So I think we were talking with Vinay,
09:41
actually these routes you are seeing on the water bodies are intentional because, I mean, GRAB basically makes the planning of the ride by minimizing the route through that water. But for the purpose of giving this back to the OSM, of course, we needed to clean that up.
10:04
To give some visual results, how the final results looked. So the major strength of the methods was it complemented satellite imagery pretty well. It created new roads that we wouldn't be able to see
10:23
from satellite imagery, especially in areas occluded with trees or dense buildings. The second advantage we observed during experimentation was the GPS data was easy on picking up changes
10:41
that's going on, even before you refresh your satellite imagery. The other thing, when you were looking at these results in different tiles, you would be able to see the amount of misalignment between the GRAB data and also the OSM data, because with OSM,
11:02
everything is delineated over images, so you would be easily understand what's going on with respect to that misalignment as well. Although in the final results, we didn't do any, we didn't do any alignment when we are pushing these into daylight, but this was also another direction
11:20
that we would saw potential use of this data. We had a few challenges with the coming data as well. One of them was noise. Sometimes the drivers took some shortcuts, which created some noisy roads around some areas,
11:40
or GPS signals might be noisy, although ML algorithms are doing their best to clean up most of the noise that's coming from there, or it might also fail to detect some of the new roads that's coming in because of the low traffic as well. This is also, goes back to our discussion
12:01
around anonymization. We only took the tracks that are over a certain, that are over a certain titration, so that leads us to miss some of the roads at the end. To give you a big picture, as a result, we conflated GPS, GRAB road data
12:24
from 62 Southeast Asian cities, and from these cities we were able to detect 24,000 kilometers of new roads into OSM. And when we look at per city improvements,
12:41
it took us almost like in some cities to 20%. On average we saw 14% improvement in the new roads that are being detected. Also, we are providing these results back to the community in our daylight release, at a regular cadence.
13:03
So feel free to check it out from our daylight website, how this can be used for the community. And lastly, we are really happy to see that this is one of the first data sets in daylight,
13:20
first road data sets, that's coming from ML, but it's a non-image based ML model. Thank you.