Exploring jittering and routing options for converting origin-destination data into route networks: towards accurate estimates of movement at the street level Introduction Origin-destination (OD) datasets provide information on aggregate travel patterns between zones and geographic entities. OD datasets are ‘implicitly geographic’, containing identification codes of the geographic objects from which trips start and end. A common approach to converting OD datasets to geographic entities, for example represented using the simple features standard (Open Geospatial Consortium Inc 2011) and saved in file formats such as GeoPackage and GeoJSON, is to represent each OD record as a straight line between zone centroids. This approach to representing OD datasets on the map has been since at least the 1950s (Boyce and Williams 2015) and is still in use today (e.g. Rae 2009). Beyond simply visualising aggregate travel patterns, centroid-based geographic desire lines are also used as the basis of many transport modelling processes. The following steps can be used to convert OD datasets into route networks, in a process that can generate nationally scalable results (Morgan and Lovelace 2020): ``` OD data converted into centroid-based geographic desire lines Calculation of routes for each desire line, with start and end points at zone centroids Aggregation of routes into route networks, with values on each segment representing the total amount of travel (‘flow’) on that part of the network, using functions such as overline() in the open source R package stplanr (Lovelace and Ellison 2018)``` This approach is tried and tested. The OD -> desire line -> route -> route network processing pipeline forms the basis of the route network results in the Propensity to Cycle Tool, an open source and publicly available map-based web application for informing strategic cycle network investment, ‘visioning’ and prioritisation (Lovelace et al. 2017; Goodman et al. 2019). However, the approach has some key limitations: ``` Flows are concentrated on transport network segments leading to zone centroids, creating distortions in the results and preventing the simulation of the diffuse networks that are particularly important for walking and cycling The results are highly dependent on the size and shape of geographic zones used to define OD data The approach is inflexible, providing few options to people who want to use valuable OD datasets in different ways``` To overcome these limitations we developed a ‘jittering’ approach to conversion of OD datasets to desire lines that randomly samples points within each zone (Lovelace, Félix, and Carlino Under Review). While that paper discussed the conceptual development of the approach, it omitted key details on its implementation in open source software. In this paper we outline the implementation of jittering and demonstrate how a single Rust crate can provide the basis of implementations in other languages. Furthermore, we demonstrate how jittering can be used to create more diffuse and accurate estimates of movement at the level of segments (‘flows’) on transport network, in reproducible code-driven workflows and with minimal computational overheads compared with the computationally intensive process of route calculation (‘routing’) or processing large GPS datasets. The overall aim is to describe the jittering approach in technical terms and its implementation in open source software. Before describing the approach, some definitions are in order: ``` Origins: locations of trip departure, typically stored as ID codes linking to zones Destinations: trip destinations, also stored as ID codes linking to zones Attributes: the number of trips made between each ‘OD pair’ and additional attributes such as route distance between each OD pair Jittering: The combined process of ‘splitting’ OD pairs representing many trips into multiple ‘sub OD’ pairs (disaggregation) and assigning origins and destinations to multiple unique points within each zone``` Approach Jittering represents a comparatively simple — compared with ‘connector’ based methods (Jafari et al. 2015) — approach is to OD data preprocessing. For each OD pair, the jittering approach consists of the following steps for each OD pair (provided it has required inputs of a disaggregation threshold, a single number greater than one, and sub-points from which origin and destination points are located): ``` Checks if the number of trips (for a given ‘disaggregation key’, e.g. ‘walking’) is greater than the disaggregation threshold. If so, the OD pair is disaggregated. This means being divided into as many pieces (‘sub-OD pairs’) as is needed, with trip counts divided by the number of sub-OD pairs, for the total to be below the disaggregation threshold. For each sub-OD pair (or each original OD pair if no disaggregation took place) origin and destination locations are randomly sampled from sub-points which optionally have weights representing relative probability of trips starting and ending there. |