Exploring jittering and routing options for converting origin-destination data into route networks: towards accurate estimates of movement at the street level
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 351 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/68894 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Production Year | 2022 |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
FOSS4G Firenze 2022348 / 351
1
7
13
22
25
31
33
36
39
41
43
44
46
52
53
55
58
59
60
76
80
93
98
104
108
127
128
133
135
141
142
143
150
151
168
173
176
178
190
196
200
201
202
204
211
219
225
226
236
242
251
258
263
270
284
285
292
00:00
VideoconferencingData Encryption StandardIRIS-TMotion captureInheritance (object-oriented programming)TouchscreenVideoconferencingSoftwarePhysical systemOpen sourceFreewareSet (mathematics)BitMeeting/InterviewComputer animation
00:58
Beer steinSystem programmingInformationArtificial neural networkDiscrete element methodOpen sourceVolumeEstimationLocal ringComputer configurationOpen setDependent and independent variablesComputer networkComputer-generated imageryService (economics)Information extractionRaster graphicsPortable communications deviceBroadcast programmingEvent horizonVideoconferencingContext awarenessMereologyLevel (video gaming)State of matterComputer animation
01:25
Web browserSelf-organizationLink (knot theory)Digital photographyBitTouch typingFamilyVideoconferencingSoftware development kitTwitterSlide rulePresentation of a groupComputer animation
02:11
Revision controlSoftware testingCode refactoringSlide ruleOctagonMorphismusComputer configurationComputer networkInformation securityCodePressureRandom matrixThermische ZustandsgleichungMountain passObject (grammar)Computational physicsOpen setStandard deviationScalable Coherent InterfaceoutputPersonal area networkCompact spaceReduction of orderNormed vector spaceSlide ruleTouch typingLaceBitContent (media)CodeComputer animation
02:39
Computer configurationFile formatRootComputer fileData structureBitPlanningAuthorizationArchaeological field surveyCollaborationismProjective planeComputer configurationNumberSoftwareOpen sourceCompact spacePublic domainCartesian coordinate systemSource code
04:13
CASE <Informatik>Computer networkPopulation densityOpen sourceImplementationVector spaceModul <Datentyp>Proof theoryScalabilityRaster graphicsSoftware frameworkProcess (computing)RoutingMedical imagingCollaborationismVector spaceSoftwarePlanningMappingLine (geometry)Cartesian coordinate systemOpen sourceCycle (graph theory)Raster graphicsSoftware frameworkMultiplication signOrder (biology)Endliche ModelltheorieResultantTesselationTerm (mathematics)Point (geometry)Projective planeFunction (mathematics)ScalabilityGeometryBitWordRootProof theoryAngleProcedural programmingModul <Datentyp>Computer animationSource code
07:45
Open sourceWeb browserImplementationVector spaceRaster graphicsModul <Datentyp>Proof theoryScalabilitySoftware frameworkLine (geometry)Default (computer science)Time zoneSingle-precision floating-point formatSoftwareDigital photographySlide ruleParameter (computer programming)Data modelQuality of serviceOptical disc driveEmbedded systemAsynchronous Transfer ModeLibrary (computing)CodeSoftware repositoryPhysical systemInstallation artPoint (geometry)Function (mathematics)Service (economics)Software testingSoftwareRange (statistics)ImplementationLine (geometry)Stress (mechanics)BitAlgorithmSet (mathematics)Thresholding (image processing)CountingRegular expressionAdditionComputer configurationSoftware frameworkVisualization (computer graphics)1 (number)Heegaard splittingResultantWordType theoryNumberRoutingPoint (geometry)Slide ruleLimit (category theory)Different (Kate Ryan album)CodeoutputMultiplicationCycle (graph theory)Fitness functionNoise (electronics)Right angleArithmetic meanSharewareDecision theoryProof theorySquare numberProcess (computing)ScatteringReal numberUniform resource locatorPlotterTwitterParameter (computer programming)Projective planeRepository (publishing)Fraction (mathematics)CodecEndliche ModelltheorieWave packetLevel (video gaming)Operator (mathematics)Product (business)Axiom of choiceMultiplication signValidity (statistics)SpacetimeRandomizationOnline helpLink (knot theory)CASE <Informatik>Extension (kinesiology)Presentation of a groupTable (information)State observerAdaptive behaviorScaling (geometry)RootLengthMedical imagingCartesian coordinate systemPerspective (visual)ComputerObservational studyCurve fittingSingle-precision floating-point formatComputer animationSource codeMeeting/Interview
Transcript: English(auto-generated)
00:02
Hello everyone at Force4G, I'm Robin and I'm going to be talking about jittering. So the first thing to say is sorry for not being there, I was in Florence and I even have the t-shirt but I can't be there today for reasons that I'll come on to in a little bit.
00:20
Just by way of background I'm using this OBS tool to record the video which is free open source software for making videos and I'm super happy with it so far and I'm just going to try and shrink myself on this system if it will allow me.
00:50
There we go, okay so now I'm little and you can see my screen and I'm going to go onto a much more interesting set of screens. So first up let's just put this in context.
01:02
I am presenting in this session and I will check the videos after when I get the opportunity and yeah really excited to be part of the Force4G event. I was at the OSM State of the Map conference before this one and yeah unfortunately I can't
01:25
be there which is the topic of my next slide. So yeah I've combined this with a bit of a family holiday so you can see this photo is actually taken today just outside Lake Como and that's my partner Katie and our
01:42
little baby Kit. So apologies for not being there in person but when you see that photo hopefully that helps clarify the situation and thanks to Marco and the organisers for allowing me to present via video I really hope to answer any questions although I don't think it's
02:03
going to be possible to have an audio link I will try and if not please get in touch via GitHub or on Twitter or any other way that you like. So talking of GitHub and getting in touch the work that I'm going to present is all
02:26
the slides are available from github.com forward slash robin novelace Force4G 2022 or actually Force4G just 22 so it's a bit more concise. Okay so on to the actual content of what I'm going to be talking about.
02:43
It's jittering and rooting options for converting origin destination data into root networks. This is a bit of a mouthful but essentially this is about using free and open source software for transport planning applications and we have developed what I think is quite
03:03
an exciting new method that can add a huge amount of value to origin destination data which is probably the number one publicly available file format for representing movement data that's in the public domain. So just for people who don't know origin destination data represents how many people
03:25
go from zone A to zone B and therefore it's good because it's a fairly compact file format or file structure. It has been used since at least the 1950s so it's very mature and most transport authorities
03:46
collect data so you can convert your household travel survey into OD data. Also a huge shout out to the collaborators on this project, Rosa Felix and Dustin Carlino
04:03
and this is also accompanied by a paper that has been peer reviewed in the Force4G affiliated journal. So moving on, why do we need to do jittering and I think an image can tell a thousand
04:20
words so more or less it's represented by the image that you can see here which represents a major transport network and this is for a project for the Republic of Ireland to generate a strategic cycle network planning tool and basically jittering is needed for
04:45
us to get the result that you can see there and we need that result to solve very clear and well known problems which is the fact that cities are congested and if you get people to switch out of cars to walking and cycling you can help solve the climate
05:04
crisis, you can certainly help solve the health and obesity crisis and you can tackle many other problems at the same time but you need good evidence on where the cycling and walking potential in order to prioritise your investment.
05:21
So that's the policy angle, I'm not going to go into that in detail, obviously this is a technical GIS conference so I'm going to focus on the application and the tech that you can use to do this. So myself and colleagues at the University of Leeds and collaborators in many other
05:44
places including the University of Lisbon where Rosa is based and the Alan Turing Institute in London which is where Dustin is based, we've been developing open source reproducible open source software for reproducible transport planning.
06:01
A lot of this stuff is actually powered by OSGeo tools so Gudal and Geos drive a lot of this stuff so I'm not going to talk about those tools because they're already out there and you can find out and they're fairly well used. Just a little bit about the kind of starting point and the motivation for developing this
06:23
tool, in a way it's based on a previous project which we've already deployed so that's represented in the maps where you can see this process of converting origin destination data into route networks.
06:42
So this is an established procedure, we've written about it in a paper back in 2020 but essentially the modelling framework we believe is modular, future proof, scalable and it can also output results in terms of vector data and raster data.
07:01
So just to explain briefly this modelling framework which is the starting point, you have origin destination data, you represent it as straight lines between zone centroids, so that's where you've got multiple lines going to the same places, then you convert them into routes and then you do this process called I guess route network aggregation to
07:25
generate a vector route network and then optionally you can convert it into a tile pyramid or vector tiles and to serve those results which is actually what we did in the result I presented earlier.
07:41
So that's the modelling framework and what is jittering? Well essentially, let's just focus in on this, that jittering is aiming to overcome the limitations of this implementation of the framework which is simply that it results in quite sparse networks. So you can see that there are quite big gaps in the network and maybe for a driving
08:05
network this doesn't matter so much but for walking and cycling all of the evidence suggests that you need a really dense network. And we've picked up on the fact over various years that it's not actually an efficient
08:21
use of computational resources to only assume that trips start at the centroid, that's just simply unrealistic. So jittering is a way to overcome, to get around this and the word jittering already exists in data visualisation, it means adding random noise to a data set for visualisation
08:46
purposes. So to avoid, if you've got a scatter plot, all of the points going on top of each other meaning you cannot see what's going on, you add a slight random movement to it so you can actually see the full extent of the data and it's exactly the same process
09:04
applied to origin destination data. We've written a paper, it's published in Transport Findings and you can check it out. So this presentation is at that starting point, we've got it, but it's trying to
09:22
work out how you can use this to get the best possible route network for your transport planning applications. And I've just got a very brief visual demo of how this works, this is a reproducible example that you can find if you look up the original jittering paper, but essentially
09:43
this is a minimal example with three OD pairs, this is actually a good example of how origin destination data looks, it's really simple, it's just a square table, but it only becomes geographically meaningful when you link it with the zones and you can see these zone
10:01
codes link to the zones here and the zones are typically the same for the origins and the destinations but they don't have to be and what jittering does is it simply moves the start and end point so B here represents just taking a completely random
10:22
point in space and fixing that as the start and the end point so you've got the same movement but the lengths of the lines and their locations have changed. One adaptation which we always recommend this now is to assign sub points and those
10:42
sub points can be on the transport network or you can have different types of sub points for example if you're routing to a particular type of destination like shops the sub points could be shops so that's the next thing that we did in jittering and then the final thing is
11:00
actually disaggregation so we're splitting a single OD pair into multiple OD pairs so for example this blue line it contains 100 to 200 trips we've just split it into four pieces with a disaggregation threshold preventing any one of these design lines having
11:24
more than trips in that particular threshold so it's fairly technical but in a way it's a very flexible and I think powerful approach and you can see the jittering technique here in operation
11:41
on this is data from Edinburgh so but it would work in any city so it gives you a much more diverse end product and what you can see here is that yes as expected it generates more
12:01
realistic networks or they seem to be more realistic they're certainly more diverse but what we didn't do in that original paper was to validate how good the network is hence we tried some validation so at the JISRC 2022 conference we presented on jittering and how it compared with
12:28
real world data for a data set in Edinburgh we used a slightly larger data set than what we presented in the original paper where we just presented the methods and we changed the jittering parameters and lo and behold we found that there was an
12:47
increase there was an improvement in the the model fit with the observed level of movement as you increase as you did jittering and as you set this disaggregation threshold so this was
13:06
only a very small test of it but it kind of was a bit of a proof of concept if you actually look at the R squared values they're extremely low so what we've got is an improvement in R squared but basically from extremely low to very low so it works but what we need to remember is
13:25
this example only used a tiny fraction of the available origin destination data it's not a very big training data set where we only had a handful of points and so we couldn't really
13:42
draw that strong conclusions from that that work another major limitation of what we presented at the JITRA conference is that we only considered one type of routing so that is a big limitation because if you imagine using a slightly different
14:03
routing algorithm you might get an entirely different result so what we discovered or what we what we realized is that maybe we should try changing both the routing options and also changing the jittering in the same process to see if we can improve the fit
14:23
and enter Lisbon so Rosa who's based at the University of Lisbon has fantastic data represented in the map below and there's also been some new cycle infrastructure in Lisbon so it's a very interesting case study from that perspective so let's move forward quickly to see
14:44
the results so essentially as you can see yes you get very different results both by changing the jittering parameters and by changing the root network parameters the image on the right hand side is obviously much better fit to the count points it's picking up this quite important
15:04
routes here than this one on the left so we've got different jittering parameters and different routing ones and I'm not going to go through all of the results you can read them in the paper but the headline is that by doing both jittering and disaggregation and by carefully selecting by
15:29
testing a range of routing options you will get the best fit and what's interesting is these are squared values are much much better than what we got from the data set in Edinburgh
15:42
so you can see here this is the the winner it's got a disaggregation threshold of 500 so that's actually pretty high and that plus jittering gets you the best result you can do more routing so when you reduce the disaggregation threshold you dramatically increase the number of
16:05
lines the number of routes that you're going to calculate to almost 2000 but this shows that there's a trade-off there like it does start to level off beyond a certain point additional disaggregation doesn't get you better results and in fact it's more important to try
16:23
a range of different routing services so you can see we tried Google we tried this level of traffic stress algorithm from the R5 routing engine we also tried Cypress Street's routing
16:40
options and yeah it's just really interesting to see that this choice of routing parameters are in fact more important beyond a certain threshold than extra disaggregation so that's it guys I mean there's not really much else to say it was extremely fun working
17:04
on this project especially taking data from another international city capital of Portugal and this is going to feed into evidence-based decision making both in Portugal and in other places so the map that I presented right at the beginning we're going to put that into
17:26
production and that will help planners in Ireland prioritize their investment where it will be most effective there's a lot of next steps here I mean there's a lot going on so there's actually quite a big parameter space and in this paper we've only
17:42
kind of tweaked a few of the available parameters so I think the next stage is to kind of scale this up find more and bigger input data sets so there's a range of possible data sets that we could use here and I'm not going to read each of them out because
18:03
time is of the essence you can certainly find the slides online and I will link to those from the github repository and also put them out on twitter so yeah I really hope that's that's been of interest if anyone is interested in running some code you can run this and that
18:26
the rust implementation can be called from the command line so and you can also run it as an R implementation so there's a rather extended abstract which I'm not going to go into but yeah thanks a lot for listening as I say apologies for not being there in person
18:45
and I really hope that you enjoyed my talk and yeah look forward to any questions um online so thanks a lot have a great phosphor g guys and yeah see you at the next one bye for now okay