We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Exploring jittering and routing options for converting origin-destination data into route networks: towards accurate estimates of movement at the street level

00:00

Formal Metadata

Title
Exploring jittering and routing options for converting origin-destination data into route networks: towards accurate estimates of movement at the street level
Title of Series
Number of Parts
351
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production Year2022

Content Metadata

Subject Area
Genre
Abstract
Exploring jittering and routing options for converting origin-destination data into route networks: towards accurate estimates of movement at the street level Introduction Origin-destination (OD) datasets provide information on aggregate travel patterns between zones and geographic entities. OD datasets are ‘implicitly geographic’, containing identification codes of the geographic objects from which trips start and end. A common approach to converting OD datasets to geographic entities, for example represented using the simple features standard (Open Geospatial Consortium Inc 2011) and saved in file formats such as GeoPackage and GeoJSON, is to represent each OD record as a straight line between zone centroids. This approach to representing OD datasets on the map has been since at least the 1950s (Boyce and Williams 2015) and is still in use today (e.g. Rae 2009). Beyond simply visualising aggregate travel patterns, centroid-based geographic desire lines are also used as the basis of many transport modelling processes. The following steps can be used to convert OD datasets into route networks, in a process that can generate nationally scalable results (Morgan and Lovelace 2020): ``` OD data converted into centroid-based geographic desire lines Calculation of routes for each desire line, with start and end points at zone centroids Aggregation of routes into route networks, with values on each segment representing the total amount of travel (‘flow’) on that part of the network, using functions such as overline() in the open source R package stplanr (Lovelace and Ellison 2018)``` This approach is tried and tested. The OD -> desire line -> route -> route network processing pipeline forms the basis of the route network results in the Propensity to Cycle Tool, an open source and publicly available map-based web application for informing strategic cycle network investment, ‘visioning’ and prioritisation (Lovelace et al. 2017; Goodman et al. 2019). However, the approach has some key limitations: ``` Flows are concentrated on transport network segments leading to zone centroids, creating distortions in the results and preventing the simulation of the diffuse networks that are particularly important for walking and cycling The results are highly dependent on the size and shape of geographic zones used to define OD data The approach is inflexible, providing few options to people who want to use valuable OD datasets in different ways``` To overcome these limitations we developed a ‘jittering’ approach to conversion of OD datasets to desire lines that randomly samples points within each zone (Lovelace, Félix, and Carlino Under Review). While that paper discussed the conceptual development of the approach, it omitted key details on its implementation in open source software. In this paper we outline the implementation of jittering and demonstrate how a single Rust crate can provide the basis of implementations in other languages. Furthermore, we demonstrate how jittering can be used to create more diffuse and accurate estimates of movement at the level of segments (‘flows’) on transport network, in reproducible code-driven workflows and with minimal computational overheads compared with the computationally intensive process of route calculation (‘routing’) or processing large GPS datasets. The overall aim is to describe the jittering approach in technical terms and its implementation in open source software. Before describing the approach, some definitions are in order: ``` Origins: locations of trip departure, typically stored as ID codes linking to zones Destinations: trip destinations, also stored as ID codes linking to zones Attributes: the number of trips made between each ‘OD pair’ and additional attributes such as route distance between each OD pair Jittering: The combined process of ‘splitting’ OD pairs representing many trips into multiple ‘sub OD’ pairs (disaggregation) and assigning origins and destinations to multiple unique points within each zone``` Approach Jittering represents a comparatively simple — compared with ‘connector’ based methods (Jafari et al. 2015) — approach is to OD data preprocessing. For each OD pair, the jittering approach consists of the following steps for each OD pair (provided it has required inputs of a disaggregation threshold, a single number greater than one, and sub-points from which origin and destination points are located): ``` Checks if the number of trips (for a given ‘disaggregation key’, e.g. ‘walking’) is greater than the disaggregation threshold. If so, the OD pair is disaggregated. This means being divided into as many pieces (‘sub-OD pairs’) as is needed, with trip counts divided by the number of sub-OD pairs, for the total to be below the disaggregation threshold. For each sub-OD pair (or each original OD pair if no disaggregation took place) origin and destination locations are randomly sampled from sub-points which optionally have weights representing relative probability of trips starting and ending there.
Keywords
202
Thumbnail
1:16:05
226
242
VideoconferencingData Encryption StandardIRIS-TMotion captureInheritance (object-oriented programming)TouchscreenVideoconferencingSoftwarePhysical systemOpen sourceFreewareSet (mathematics)BitMeeting/InterviewComputer animation
Beer steinSystem programmingInformationArtificial neural networkDiscrete element methodOpen sourceVolumeEstimationLocal ringComputer configurationOpen setDependent and independent variablesComputer networkComputer-generated imageryService (economics)Information extractionRaster graphicsPortable communications deviceBroadcast programmingEvent horizonVideoconferencingContext awarenessMereologyLevel (video gaming)State of matterComputer animation
Web browserSelf-organizationLink (knot theory)Digital photographyBitTouch typingFamilyVideoconferencingSoftware development kitTwitterSlide rulePresentation of a groupComputer animation
Revision controlSoftware testingCode refactoringSlide ruleOctagonMorphismusComputer configurationComputer networkInformation securityCodePressureRandom matrixThermische ZustandsgleichungMountain passObject (grammar)Computational physicsOpen setStandard deviationScalable Coherent InterfaceoutputPersonal area networkCompact spaceReduction of orderNormed vector spaceSlide ruleTouch typingLaceBitContent (media)CodeComputer animation
Computer configurationFile formatRootComputer fileData structureBitPlanningAuthorizationArchaeological field surveyCollaborationismProjective planeComputer configurationNumberSoftwareOpen sourceCompact spacePublic domainCartesian coordinate systemSource code
CASE <Informatik>Computer networkPopulation densityOpen sourceImplementationVector spaceModul <Datentyp>Proof theoryScalabilityRaster graphicsSoftware frameworkProcess (computing)RoutingMedical imagingCollaborationismVector spaceSoftwarePlanningMappingLine (geometry)Cartesian coordinate systemOpen sourceCycle (graph theory)Raster graphicsSoftware frameworkMultiplication signOrder (biology)Endliche ModelltheorieResultantTesselationTerm (mathematics)Point (geometry)Projective planeFunction (mathematics)ScalabilityGeometryBitWordRootProof theoryAngleProcedural programmingModul <Datentyp>Computer animationSource code
Open sourceWeb browserImplementationVector spaceRaster graphicsModul <Datentyp>Proof theoryScalabilitySoftware frameworkLine (geometry)Default (computer science)Time zoneSingle-precision floating-point formatSoftwareDigital photographySlide ruleParameter (computer programming)Data modelQuality of serviceOptical disc driveEmbedded systemAsynchronous Transfer ModeLibrary (computing)CodeSoftware repositoryPhysical systemInstallation artPoint (geometry)Function (mathematics)Service (economics)Software testingSoftwareRange (statistics)ImplementationLine (geometry)Stress (mechanics)BitAlgorithmSet (mathematics)Thresholding (image processing)CountingRegular expressionAdditionComputer configurationSoftware frameworkVisualization (computer graphics)1 (number)Heegaard splittingResultantWordType theoryNumberRoutingPoint (geometry)Slide ruleLimit (category theory)Different (Kate Ryan album)CodeoutputMultiplicationCycle (graph theory)Fitness functionNoise (electronics)Right angleArithmetic meanSharewareDecision theoryProof theorySquare numberProcess (computing)ScatteringReal numberUniform resource locatorPlotterTwitterParameter (computer programming)Projective planeRepository (publishing)Fraction (mathematics)CodecEndliche ModelltheorieWave packetLevel (video gaming)Operator (mathematics)Product (business)Axiom of choiceMultiplication signValidity (statistics)SpacetimeRandomizationOnline helpLink (knot theory)CASE <Informatik>Extension (kinesiology)Presentation of a groupTable (information)State observerAdaptive behaviorScaling (geometry)RootLengthMedical imagingCartesian coordinate systemPerspective (visual)ComputerObservational studyCurve fittingSingle-precision floating-point formatComputer animationSource codeMeeting/Interview
Transcript: English(auto-generated)
Hello everyone at Force4G, I'm Robin and I'm going to be talking about jittering. So the first thing to say is sorry for not being there, I was in Florence and I even have the t-shirt but I can't be there today for reasons that I'll come on to in a little bit.
Just by way of background I'm using this OBS tool to record the video which is free open source software for making videos and I'm super happy with it so far and I'm just going to try and shrink myself on this system if it will allow me.
There we go, okay so now I'm little and you can see my screen and I'm going to go onto a much more interesting set of screens. So first up let's just put this in context.
I am presenting in this session and I will check the videos after when I get the opportunity and yeah really excited to be part of the Force4G event. I was at the OSM State of the Map conference before this one and yeah unfortunately I can't
be there which is the topic of my next slide. So yeah I've combined this with a bit of a family holiday so you can see this photo is actually taken today just outside Lake Como and that's my partner Katie and our
little baby Kit. So apologies for not being there in person but when you see that photo hopefully that helps clarify the situation and thanks to Marco and the organisers for allowing me to present via video I really hope to answer any questions although I don't think it's
going to be possible to have an audio link I will try and if not please get in touch via GitHub or on Twitter or any other way that you like. So talking of GitHub and getting in touch the work that I'm going to present is all
the slides are available from github.com forward slash robin novelace Force4G 2022 or actually Force4G just 22 so it's a bit more concise. Okay so on to the actual content of what I'm going to be talking about.
It's jittering and rooting options for converting origin destination data into root networks. This is a bit of a mouthful but essentially this is about using free and open source software for transport planning applications and we have developed what I think is quite
an exciting new method that can add a huge amount of value to origin destination data which is probably the number one publicly available file format for representing movement data that's in the public domain. So just for people who don't know origin destination data represents how many people
go from zone A to zone B and therefore it's good because it's a fairly compact file format or file structure. It has been used since at least the 1950s so it's very mature and most transport authorities
collect data so you can convert your household travel survey into OD data. Also a huge shout out to the collaborators on this project, Rosa Felix and Dustin Carlino
and this is also accompanied by a paper that has been peer reviewed in the Force4G affiliated journal. So moving on, why do we need to do jittering and I think an image can tell a thousand
words so more or less it's represented by the image that you can see here which represents a major transport network and this is for a project for the Republic of Ireland to generate a strategic cycle network planning tool and basically jittering is needed for
us to get the result that you can see there and we need that result to solve very clear and well known problems which is the fact that cities are congested and if you get people to switch out of cars to walking and cycling you can help solve the climate
crisis, you can certainly help solve the health and obesity crisis and you can tackle many other problems at the same time but you need good evidence on where the cycling and walking potential in order to prioritise your investment.
So that's the policy angle, I'm not going to go into that in detail, obviously this is a technical GIS conference so I'm going to focus on the application and the tech that you can use to do this. So myself and colleagues at the University of Leeds and collaborators in many other
places including the University of Lisbon where Rosa is based and the Alan Turing Institute in London which is where Dustin is based, we've been developing open source reproducible open source software for reproducible transport planning.
A lot of this stuff is actually powered by OSGeo tools so Gudal and Geos drive a lot of this stuff so I'm not going to talk about those tools because they're already out there and you can find out and they're fairly well used. Just a little bit about the kind of starting point and the motivation for developing this
tool, in a way it's based on a previous project which we've already deployed so that's represented in the maps where you can see this process of converting origin destination data into route networks.
So this is an established procedure, we've written about it in a paper back in 2020 but essentially the modelling framework we believe is modular, future proof, scalable and it can also output results in terms of vector data and raster data.
So just to explain briefly this modelling framework which is the starting point, you have origin destination data, you represent it as straight lines between zone centroids, so that's where you've got multiple lines going to the same places, then you convert them into routes and then you do this process called I guess route network aggregation to
generate a vector route network and then optionally you can convert it into a tile pyramid or vector tiles and to serve those results which is actually what we did in the result I presented earlier.
So that's the modelling framework and what is jittering? Well essentially, let's just focus in on this, that jittering is aiming to overcome the limitations of this implementation of the framework which is simply that it results in quite sparse networks. So you can see that there are quite big gaps in the network and maybe for a driving
network this doesn't matter so much but for walking and cycling all of the evidence suggests that you need a really dense network. And we've picked up on the fact over various years that it's not actually an efficient
use of computational resources to only assume that trips start at the centroid, that's just simply unrealistic. So jittering is a way to overcome, to get around this and the word jittering already exists in data visualisation, it means adding random noise to a data set for visualisation
purposes. So to avoid, if you've got a scatter plot, all of the points going on top of each other meaning you cannot see what's going on, you add a slight random movement to it so you can actually see the full extent of the data and it's exactly the same process
applied to origin destination data. We've written a paper, it's published in Transport Findings and you can check it out. So this presentation is at that starting point, we've got it, but it's trying to
work out how you can use this to get the best possible route network for your transport planning applications. And I've just got a very brief visual demo of how this works, this is a reproducible example that you can find if you look up the original jittering paper, but essentially
this is a minimal example with three OD pairs, this is actually a good example of how origin destination data looks, it's really simple, it's just a square table, but it only becomes geographically meaningful when you link it with the zones and you can see these zone
codes link to the zones here and the zones are typically the same for the origins and the destinations but they don't have to be and what jittering does is it simply moves the start and end point so B here represents just taking a completely random
point in space and fixing that as the start and the end point so you've got the same movement but the lengths of the lines and their locations have changed. One adaptation which we always recommend this now is to assign sub points and those
sub points can be on the transport network or you can have different types of sub points for example if you're routing to a particular type of destination like shops the sub points could be shops so that's the next thing that we did in jittering and then the final thing is
actually disaggregation so we're splitting a single OD pair into multiple OD pairs so for example this blue line it contains 100 to 200 trips we've just split it into four pieces with a disaggregation threshold preventing any one of these design lines having
more than trips in that particular threshold so it's fairly technical but in a way it's a very flexible and I think powerful approach and you can see the jittering technique here in operation
on this is data from Edinburgh so but it would work in any city so it gives you a much more diverse end product and what you can see here is that yes as expected it generates more
realistic networks or they seem to be more realistic they're certainly more diverse but what we didn't do in that original paper was to validate how good the network is hence we tried some validation so at the JISRC 2022 conference we presented on jittering and how it compared with
real world data for a data set in Edinburgh we used a slightly larger data set than what we presented in the original paper where we just presented the methods and we changed the jittering parameters and lo and behold we found that there was an
increase there was an improvement in the the model fit with the observed level of movement as you increase as you did jittering and as you set this disaggregation threshold so this was
only a very small test of it but it kind of was a bit of a proof of concept if you actually look at the R squared values they're extremely low so what we've got is an improvement in R squared but basically from extremely low to very low so it works but what we need to remember is
this example only used a tiny fraction of the available origin destination data it's not a very big training data set where we only had a handful of points and so we couldn't really
draw that strong conclusions from that that work another major limitation of what we presented at the JITRA conference is that we only considered one type of routing so that is a big limitation because if you imagine using a slightly different
routing algorithm you might get an entirely different result so what we discovered or what we what we realized is that maybe we should try changing both the routing options and also changing the jittering in the same process to see if we can improve the fit
and enter Lisbon so Rosa who's based at the University of Lisbon has fantastic data represented in the map below and there's also been some new cycle infrastructure in Lisbon so it's a very interesting case study from that perspective so let's move forward quickly to see
the results so essentially as you can see yes you get very different results both by changing the jittering parameters and by changing the root network parameters the image on the right hand side is obviously much better fit to the count points it's picking up this quite important
routes here than this one on the left so we've got different jittering parameters and different routing ones and I'm not going to go through all of the results you can read them in the paper but the headline is that by doing both jittering and disaggregation and by carefully selecting by
testing a range of routing options you will get the best fit and what's interesting is these are squared values are much much better than what we got from the data set in Edinburgh
so you can see here this is the the winner it's got a disaggregation threshold of 500 so that's actually pretty high and that plus jittering gets you the best result you can do more routing so when you reduce the disaggregation threshold you dramatically increase the number of
lines the number of routes that you're going to calculate to almost 2000 but this shows that there's a trade-off there like it does start to level off beyond a certain point additional disaggregation doesn't get you better results and in fact it's more important to try
a range of different routing services so you can see we tried Google we tried this level of traffic stress algorithm from the R5 routing engine we also tried Cypress Street's routing
options and yeah it's just really interesting to see that this choice of routing parameters are in fact more important beyond a certain threshold than extra disaggregation so that's it guys I mean there's not really much else to say it was extremely fun working
on this project especially taking data from another international city capital of Portugal and this is going to feed into evidence-based decision making both in Portugal and in other places so the map that I presented right at the beginning we're going to put that into
production and that will help planners in Ireland prioritize their investment where it will be most effective there's a lot of next steps here I mean there's a lot going on so there's actually quite a big parameter space and in this paper we've only
kind of tweaked a few of the available parameters so I think the next stage is to kind of scale this up find more and bigger input data sets so there's a range of possible data sets that we could use here and I'm not going to read each of them out because
time is of the essence you can certainly find the slides online and I will link to those from the github repository and also put them out on twitter so yeah I really hope that's that's been of interest if anyone is interested in running some code you can run this and that
the rust implementation can be called from the command line so and you can also run it as an R implementation so there's a rather extended abstract which I'm not going to go into but yeah thanks a lot for listening as I say apologies for not being there in person
and I really hope that you enjoyed my talk and yeah look forward to any questions um online so thanks a lot have a great phosphor g guys and yeah see you at the next one bye for now okay