Add to Watchlist

"Sliding" datasets together for more automated map tracing

0 views

Citation of segment
Embed Code
Purchasing a DVD Cite video

Formal Metadata

Title "Sliding" datasets together for more automated map tracing
Title of Series FOSS4G 2014 Portland
Author Mach, Paul
License CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
DOI 10.5446/31694
Publisher FOSS4G, Open Source Geospatial Foundation (OSGeo)
Release Date 2014
Language English
Producer Foss4G
Open Source Geospatial Foundation (OSGeo)
Production Year 2014
Production Place Portland, Oregon, United States of America

Content Metadata

Subject Area Computer Science
Abstract Importing new/updated geometry into large dataset like Open Street Map is tricky business. Features represented in both need to be detected and merged. Often times editors are asked to completely "retrace" over updated maps as automated methods are unreliable.While a 100% accurate merge is impossible, it is possible to auto create a best guess and let the user refine from there, eliminating as many manual, tedious steps as possible.Slide is a tool designed to solve this problem and works by iteratively refining roads, trails and other complex geometries to match another dataset, where the features are correctly mapped. In a single click one geometry is "slided" to the other, eliminating hundreds of tedious clicks.The form of the new dataset is flexible. It could be an updated representation of roads such as the new TIGER database, a scanned historical paper map, or a large collection of GPS data points like the 250+ billion made available by Strava, a fitness tracking website.Overall, Slide is designed to leverage what we already know, collected in various datasets, to speed map tracing. Map editors should be focusing on higher level challenges and not just retracing over another dataset.
Keywords Open Street Map
geometry
mathematical optimization
Series
Annotations
Transcript
Loading...
welcome to talk about friends call marked and I'm talk about death merging datasets and taking like large like sensor data sets and merging with like vector datasets so it's kind of a a dry topic you know it's not there eigenvector tiles or anything like that but hopefully you know I have some demos and 7 pictures and stuff towards the end to but you make it interesting and fully yeah I think so but for some start off with a a big picture of the problem and trying to solve with tools and building and so the goal is basically to merge data into OpenStreetMap and make it easier so this kind of a glide goal like whatever but this would define so it terms to really specify when talking about so why talk about data and talk about light sensor data so not like this shape file and imported into I give this geometry imported into OpenStreetMap it's more like I have these billions of data points and I want to and there's information there and I want transfer that into another dataset specifically OpenStreetMap in this case and then the data so that's like the data coming in the downtown fixes the geometry you not find that this this isn't right importing addresses or building a land use something like that it's a fix the roads and trails primarily so that's collected data talking about emerging the and then what about the easier is I want to be it's not unalterable command line tool and I'm not trying to build and I don't want to be a like tons of pointing and clicking a analogies map tracing on top of a map so somehow like some I automated helping of going from the information in 1 dataset and merging it in with the OpenStreetMap stuff and so why OpenStreetMap well why not I mean if you talk about like the philosophical stuff of it work but it's used at at my work stronger we for routing so as to across benefit of improving the dataset can helps everybody so will cut dataset I talk about others there's much different examples specifically since I work at Ostrava we have these large global GPS dataset with hundreds of billions of GPS points from millions of rounds and I those you not familiar with striver it's that
fitness so fitness tracking website online network for athletes know basically the flow is you turn on the aptly people in your pocket you go for a bike ride when you're done you upload it and wheat shower you with beautiful experiences n and that's that's it's a collapse but what's relevant here is we end up with these billions and billions of GPS data points that you start to wonder like what can we do with this what kind of like information is hidden in these numbers of these like basically lat-long points so the 1st thing I did
about 6 months ago was just take all these billions of data points and put my map so here's an example of a this is the basic heatmap event end so here's 1 here's another example of
Europe but yet is not just a is not just the population density map but it does go down to zoom level 15 and you
do see like there is some information here where people go and where they don't go so it
is just a clear-cut fish upon the heat map staff yeah does have as true as 22 billion points for March and so have to update it here in the next few months with more data and others Orion right version and it's should screenshots here but there's a slipping that version online we can zoom pan all that good stuff and it's technically not like an open dataset Mustafa stuff but it is available here for browsing inferred tracing in so you try you know tried advertiser balance the needs of what like the business people struggle want and barter good idea with like what can we do to open up this data for mutual benefit on both sides
so how can we use this data to improve map well as use as you can see here that there
is definitely like information here there's there's a trails there's the roses ones are more popular ones are less popular that your parts of the map that people just don't go in or that cyclist ongoing and so 1 thing that we're doing this kind of
side is we have about this were mapping all this data to road networks for cities so cities of commerce and then like a look but you has a lot of cycling data we want improve reduced infrastructure based on data that commuters helpers out so the Sistine there's 2 2 2 guys that that will work were working with cities to kind of you know the city provides you've it has ever since differ requirements but they provide their own road network from their GIS stuff we map all the siphoned added to that and tell you like time of day Number of Users Number rides like all that great stuff and then it's like a GIS product out so cost start a matter of and yes side like the page the advertisement that may be relevant to some audience but that's not what I'm here to talk about
but from you talk about is this tool that build cults slide and that you will you understand a few moments why called slides and the idea is to take that information that's in the heat map and that bringing in opens being in OpenStreetMap specifically bring out and have like a meaningful fast way to just more automated map tracing that's going on at a whole and OpenStreetMap so machine
example here so this this is a page that a built to discover like show off slide and we're looking at is the standard OpenStreetMap base layer in knowledge base glory covered up with that the blue purple red that he mapped data for that area for this 1 area it's a norm somewhere and there if you look close you'll see that there's no there's no trail that corresponds to that heat maps so what you can do the way slide works is you get outline that is the course out the course outline of this map but this trail and then click the slide but anal match the that he that so the idea is you know 5 klicks forces 100 clicks to get this this line that matches up the trail because if you've ever like been mountain biking or running river trails a nice in 1 D and that takes a long time to like sample properly so so as the input you this course black line and then it iteratively you improves that black line slides it in the place with the duty that such kind where the name comes from that but you just you just off the ballot it's a server-side tools not running JavaScript so does do round-trip but it is pretty fast it takes about like a quarter of a 2nd to run this they animation takes a little bit longer but but the it is yeah it's it's quan real-time this and this is sometimes it does depend on like the input line being cluster see how it this kind of stuff and watch so so here's another
animation that just back at the back of the animation so how does it has a work I
not high-level to basically but you can think of all the GPS data as like a density distribution of where people are so there's places people go all the time like on the trail on the tapes people that never go which is 10 years or so with that data you can bill like this density distribution surface like this where the high density quarters will be lower and that other places will be higher and then you can take your input polyline the black line from the previous example and I consider collect a string of beads and lay on the surface and just let gravity do its thing and slide down into the valleys so it's kind of like that the model that I was thinking about my head when I developed a tool like that's the physics not so physics but the concept that I want to that model there so it's kind
of you know there's a lot of overlap between all sorts of stuff in science but this is kind based off of mathematical optimization where you have a cost function anyone iterate over your function and improve the cost of lower in most cases so in this in slide there's 3 cost functions right now the 3 components to the cost function and that 1 is obviously the debt to the surface like you wanna go lower in that that then you I make sure that like points are equidistant and that the angle doesn't like it super shop in the line so that's just to maintain like the rich rigidity of the line and that to keep it from collapsing on itself so those 3 those 3 costs are computed every time so this kind of
public that is a complicated slider too detailed slide that I put it in there so it's in their like the online version but the basic concept is you input the line you input the heat map data and then you go to this loop for you it is iteratively tried improve the cost and so once you improve it where you can't anymore but he simplified down and you output the result so it's kind of this iterative refinement process of matching what you put with like discourse sample and making it right better or in some sense transferring the information of that's in Ostrava data into Europe all so here's
here's kind of like just by it's server-side written go I can leverage any dataset I think you'd currently the 1 I'm using is a struggle 1 the most interesting but I have some other examples that you can use that it's an iterative refinement process which um this guy cool and it's reasonably fast so you you can work as a like a weather typescript and so I 1st presented this at the of conference a few months ago and incorporate this code into the ID editor which is like the default OSM editor it so you can instead just that demo that I showed
you you can actually you know add data to OpenStreetMap using it and since since then like I haven't done like the best marketing on it but you to a people of used it there's been 6 thousand changesets using this editor which I think is pretty significant court so you on OpenStreetMap are in idea the flows you have the same so you can connect up here for those of you that have used it before you can in dry your costs to course overview line annotated as like a general path or whatever you like and then you click little slide can't and I'll do the same thing
so there's there's 2 ways to interact with it in there In idea editor and 1 is 2 you select and to select a subset of points so here I have 3 nodes on that way and it's the slide that portion in between those nodes of you know in
practice you can have like a really really long bike path super super-long way anyone at best is best to just slide like portions of it and walk or along so that's that's 1 way to do it or the the weight I showed you your slide the whole thing which works in this case this is
relatively short yeah so that's that's sliding to have started at which is I found very useful you know we're like from a company standpoint were trying to routing based off of OpenStreetMap that everyone every mountain biker once around on the trails those trails are in OpenStreetMap so we can really provide solution of them so yeah we want improve OpenStreetMap to improve around and have like a win-win and as a bonus we wanna take we want you leverage our data to make that easier and that's kind of the birth of what's lightest it
but you know it's not at and try think of this concept is like at a higher level than just like how I get trotted data into OpenStreetMap but how do we get like other data and so at so somebody you some of the datasets I've been playing with and just to kind of like the to merge these data this data in in carbon some I automated way so that yeah is that's part of what what I 0 1 try shows like it's not just the algorithm it's like the incorporation with the editor that makes it so that you still doing the same thing but just faster is still of a person looking at it in verifying that something down didn't happen but it's way faster than before because what I don't want like slide to be is like this command line prompt were you press go it commits a thousand things and you don't know whether was right or wrong 1 so this approach and try take like that middle ground of so my automated the so the other place i've have incorporated this algorithm is to tiger that's the Tigers the US Census stuff that they put out and what has been
merged in OpenStreetMap like a few years ago in 2007 age or whatever is the old stuff and since then counties have improved the tiger data but it's unclear how to like mercenary in with what's already in OpenStreetMap so here's 1 example this like a screenshot from the idea editor where you have the white and the green are like the OSM is and the yellow stuff underneath it is the new Tiger that that yeah you can zoom in and look at it you nothing's perfect but it's a hell of a lot better than what's there you can we just have OpenStreetMap like SNPs spider that is basically the concept is like the you know all the ultimate like thing would just be like yes this looks right do you think and fix it so right now is kind of like the 1st version of that and here's
some other examples where the new Tiger stuff is like totally great some but what's in OpenStreetMap isn't like can we merge that in In a similar automated way because doing it on your countrywide automated thing is a bad idea but in some automated where it is least of some looking at every change but quickly is the approach to use known come I
favorite like like this the that the topology is there like it's there which is totally off and the tightest stuff is just is perfect is too right so why can't we just have that like matching and in other places
you have like a smaller stuff small changes like this so you can basically apply the same slide algorithm but instead of sliding to Strobel she density you can just slide to yellow line so just can't do a quick
demo and fixing this area the the and as I was playing with this of like well you know as a 1st step can't I just snap all the nodes to the nearest node in this case the we should also the help to automated but the idea is is the same as you can you don't have a course like background and the cost lines that match the but not dead but not completely then you can can select these wastes and slide
session of this 1 of the most cited
over its say idea is like
the tightest stuff smooth properly sample
looks pretty great but I want to be you know us and I don't want to point click a thousand times to get in there and validated 2 here in this in this subset of test the
Internet got panning around so
agency like come at that intersection up there it's not totally perfect and that's part of the process is like it doesn't change and points of the end point is in right but that's part of the like a tonight and is going to end is make that small edit in this area is fixed you where it works it works well for a like wine the roads kind of those like rural roads like neighborhoods where there's a lot just like often often not used roads the warranty that take forever sample and no 1 has taken the time
to you correct them in OpenStreetMap yet so here's another example just
emerging it like that but sigh ICS so but again it's it's a meiotic so the idea is to be like a you know some and the so my exper at editing OpenStreetMap iters formula clean up this area superfast and that's how it is like a low level but I like a higher level concept and transferring the data of Tiger the information there and bring it to OpenStreetMap in in an easy way so
that's where I have working so far and i have links and on the next slide but does the I had just something something about way of improving this is how to get more input from the dataset so can the astute conference uh listener would realize that hey this straw data is these polylines is like 1 D polylines and that she met data is just like a density function to losing like direction in order of your debt you're losing that information so like can we pull more in like what if we knew the direction at every point on the map ahead like a direction distribution or something to incorporate direction into into it because right now density and that I can the motivation for that is like sharp turns and switchbacks and stuff don't do so well with the slide algorithm because it tries to minimize sharp turns so In my theory were like this and I wanna work on next is incorporating them the direction information from that to help help so you bring more better smoothing and sometimes I like the way the optimization happens is you might get like a little Jagatee minimum so smoothing with that and then more complex geometries that this is come I like like
which is your friend but it's edited OpenStreetMap and you see something like this where the tiger data is perfect but the OpenStreetMap isn't and to you it's completely obvious what should happen to OpenStreetMap that it should just shift a little bit and clean itself up I sort but that does that thank so that's that's can I like the goal in like the vision of of this thing is to step you automate that like the infos their sums that a lot of time I cleaning up the tiger data you know there for us can we just merge it writing there that's a that's my
presentation slide here's all links that don't have their own a few has 1 copy that Ana I'll post on Twitter poster presentation on Twitter but at yes so it's kind of like so that's come at the slide in home appliances trotted out and the Tigers stuff that a high level like it can it was like I explore techniques of like merging in these like non-vector data sets in your vector that set so as like Big Data gets bigger and there's access to that can the some open ways how can we merge that into like something like OpenStreetMap even like a city network like how do you really take yeah and how can you like contract with the cell phone provider that gives you the Jillian's of GPS points and make that useful you yes you can trace over it has like an unreliable you know they gets but really boring fast so she studies like tools to make that information transfer that merging of data automated thank you very questions few but but but fixed the optimization of hurry deciding how many points of the poem Iris resample at l like every 5 meters so she's like a fixed fight Europe yes so resample exo I mean I want so question is how many points to add to my polygon in optimization so the 1st step is i resample at like 5 meter intervals so I can like mimic the flexibility of a string and so take those and minimize it and I simplified again after the fact so that hopefully you know we can talk simplification algorithms which at this i khaki way of doing it which works pretty well but Michael you more points of the curvature part unless at the flat part so at that's like the final price the trade is using something like this with a small user groups who may not use strove affair the activities of to map out areas that they go the thinking people like rock climbers hunters I agree would be tracking activity will you doing it but you benefit from something like this to actually map the trails are areas that go yeah I think I think the way to do that would be to use the OSM traces is they have their layer underneath that to be of the slide that it's a little bit questionable or if you have like multiple ones they're right next to each other like what are assigned to you but hi yeah I mean the more the more we can make of this thing that better that but like I actually having people like go walk around with their devices and like use travel just walking around in the woods and using the input from 5 10 people to slide the trail they haven't like like Abstr of a mapping party years something that not having really but honestly I presented this a few months ago and this is how it the next version of that I discovered like working on my own and like car while are presented get get out there get people's ideas and feedback on what could happen and you know also part like the next step is to like market in some sense you know test unit of that those some community inches prey people like that would be willing to use it and that know how to use it right but that is doing and just is on what using the sitting still in aerial photo it seems awfully tempting to try to snap to high contrast the the it in in a in a photo edited by have tried to that had not tried that but yet conceptually anything they can like build this like surface concept of of like yeah like this this comic a concept of like things that have higher value lower value you can apply to so I tried using like map stand in the data was little noisy in didn't work so well I'm quite given up on it but it was a little bit harder do to the but the other thing you want right to the asked what what what about if Mrs. but greasing minutes going to be an idea or it is our way as I can be ever back in like those him the idea that the is the beauty of idea is that you can for kidnap that whatever you want to so I forked it twice for 1 is to have the version where you struck a slide Estrada data and then 1 is to the Tiger stuff so maybe it's best to combine those 2 but so the fork of ID such I keep it light up to date with the development that's going on there but yes no like officially on the on the website thank you
Point (geometry)
Building
Information
Mapping
Computer file
Demo (music)
Algorithm
Geometry
Set (mathematics)
Heat transfer
Shape (magazine)
Open set
Tessellation
Roundness (object)
Computer animation
Vector space
Personal digital assistant
Term (mathematics)
Analogy
Eigenvalues and eigenvectors
Right angle
Address space
Point (geometry)
Dataflow
Information
Mapping
Video tracking
Fitness function
Event horizon
Fitness function
Number
Computer network
Computer network
Website
Website
Point (geometry)
Mapping
Information
Chemical equation
Point (geometry)
Staff (military)
Open set
Revision control
Goodness of fit
Population density
Computer animation
Revision control
Energy level
Right angle
Mutual information
Computer animation
Mapping
Information
1 (number)
Mereology
Web page
Product (category theory)
Product (category theory)
Slide rule
Mapping
Information
Moment (mathematics)
Virtual machine
Number
Computer animation
Commutator
Computer network
Computer network
Cycle (graph theory)
Area
Web page
Trail
Standard deviation
Knowledge base
Mapping
Multiplication sign
Forcing (mathematics)
Sampling (statistics)
Core dump
Bit
Line (geometry)
Computer font
Annulus (mathematics)
Computer animation
output
Normal (geometry)
Matching (graph theory)
Point (geometry)
Surface
Trail
Equals sign
Scientific modelling
Multiplication sign
Connectivity (graph theory)
Distribution (mathematics)
Tape drive
Disk read-and-write head
Population density
String (computer science)
Mathematical optimization
Slide rule
Surface
Point (geometry)
Physicalism
Line (geometry)
Component-based software engineering
Computer animation
Angle
Personal digital assistant
Cost curve
output
Gravitation
Iteration
Quicksort
Mathematical optimization
Mobile app
Default (computer science)
Process (computing)
Slide rule
Information
Mapping
Demo (music)
Code
Set (mathematics)
Demo (music)
Sampling (statistics)
Line (geometry)
Function (mathematics)
Revision control
Loop (music)
Computer animation
Raster graphics
output
Iteration
Text editor
Resultant
Point (geometry)
Dataflow
Vertex (graph theory)
Text editor
Line (geometry)
Subset
Inheritance (object-oriented programming)
Personal digital assistant
Weight
Revision control
Algorithm
Computer animation
Lecture/Conference
Energy level
Text editor
Mereology
Mathematics
Network topology
Area
Algorithm
Mathematics
Population density
Demo (music)
Personal digital assistant
Vertex (graph theory)
Online help
Line (geometry)
Multiplication sign
Sampling (statistics)
Software testing
Subset
Point (geometry)
Area
Mathematics
Process (computing)
Internetworking
Multiplication sign
Neighbourhood (graph theory)
Sampling (statistics)
Mereology
Area
Information
Energy level
Point (geometry)
Density functional theory
Algorithm
Smoothing
Information
Mapping
Geometry
Direction (geometry)
Multiplication sign
Distribution (mathematics)
Geometry
Theory
Maxima and minima
Summation
Computer animation
Order (biology)
output
Information
output
Mathematical optimization
Point (geometry)
Metre
Trail
Presentation of a group
Observational study
1 (number)
Design by contract
Resampling (statistics)
Heat transfer
Mereology
Tracing (software)
Twitter
Revision control
Linker (computing)
String (computer science)
Energy level
Software testing
Contrast (vision)
Units of measurement
Area
Algorithm
Mapping
Information
Software developer
Surface
Polygon
Feedback
Bit
Set (mathematics)
Local Group
Digital photography
Computer animation
Vector space
Internet service provider
Computer network
output
Mathematical optimization
Loading...
Feedback

Timings

  484 ms - page object

Version

AV-Portal 3.8.2 (0bb840d79881f4e1b2f2d6f66c37060441d4bb2e)