Logo TIB AV-Portal Logo TIB AV-Portal

Fast Travel Sheds using GTFS Data in GeoTrellis Transit

Video in TIB AV-Portal: Fast Travel Sheds using GTFS Data in GeoTrellis Transit

Formal Metadata

Fast Travel Sheds using GTFS Data in GeoTrellis Transit
Title of Series
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date
Open Source Geospatial Foundation (OSGeo)
Production Year
Production Place
Portland, Oregon, United States of America

Content Metadata

Subject Area
General Transit Feed Specification (GTFS) data is the open standard for representing transit systems in space and time. While developing an open source planning application for public transit agencies, it became clear that processing speed was the primary impediment to calculating transit coverage indicators within a reasonable time. At a glance, GTFS is just a set of simple CSV files organized relationally with key fields. But transit systems are far more complex than just spatial data for routes and stops. They need to be able to model spatial-temporal relationships embodied in transit schedules as well as semi-cyclical and shifting schedule patterns. Additionally, the specification is flexible enough to represent many different approaches to operating transit systems and the same system attributes can often be represented in multiple ways.While some transit system metrics are fairly straightforward to compute, certain public transit system metrics are best modeled as "travel shed" represented by raster coverages or isolines derived from them. The GeoTrellis Transit project is an extension of the open source GeoTrellis framework and was created to calculate travel shed rasters using GTFS and OpenStreetMap data. GeoTrellis Transit accomplishes this by creating a time-dependant graph structure that can rapidly perform shortest path queries at a given time of day, based on the public transit schedule.The challenge in developing GeoTrellis Transit involved designing a time-dependent graph structure that contains information about how the nodes connect at any particular moment in time during traversal. Shortest path algorithms on time-dependant graphs need to take into account arrival times at any given node, as well as wait times until an edge becomes available. This makes fast calculation of shortest path trees on time-dependant graphs difficult, which GeoTrellis Transit optimizes using a novel data structure to represent the graph.This presentation will introduce the GTFS standard and identify where difficulties may arise, especially for large systems. It will also describe how GTFS, OpenStreetMap and GeoTrellis Transit can be combined to build a fast time-dependant graph structure that can then be used to create time-based shortest path trees and travel shed rasters.
Keywords GeoTrellis Transit GTFS Transit Data Travel Sheds Raster Graph
scale Computer animation projects analysis bits man
Specifications Computer animation time
point labelled transition systems Specifications files mapping Super transportation time The list Databases lines Computer animation Software normal system structure extent variations
complex focus schedule time construction Part sequence Stopping Time Computer animation Software system Transduktor <Automatentheorie> table Routing spacetime
track Computer animation visualization directions time sort Routing training
schedule Graph Computer animation Lecture/Conference time open subsets figure
server building transportation time projects help distances Computer animation Software different buffer system model
transportation time outliers ones system help sort metrics distances Router number
Sequel Meeting/Interview Lecture/Conference different feedback metrics number
area statistics processes Computer animation Software time shell share system indicators
area Computer animation Lecture/Conference different cellular Ranges counting sets
area point time zone Graph mapping demo time polygons maximal applications saturation points Computer animation sort descriptions
multiple Cut Graph Computer animation physicists time NET graphs structure theoretical
Graph Computer animation graphs vertices sets source
point Graph Computer animation graphs
multiple Graph NET graphs
specific Graph Computer animation naturally NET workstation graphs vertices BUS Twitter computational
point Shortest Paths Types labelled transition systems shift Computer animation topology time BUS vertices structure
point web pages multiple algorithm labelled transition systems server Graph Open Source NET time Development projects graphs lines graphs Computer animation visualization topology vertices multimodal source modes
point Open Source states Ranges Contracts open subsets Modular specific memory configuration hierarchy conditions algorithm batch Graph NET analysis graphs Types Computer animation Software sort objects modes libraries
point statistics schedule states time Ranges time machine Coloured second computational Arrays rates Lecture/Conference BUS structure optimal area labelled transition systems Graph generate information mapping NET cellular graphs traction Indexable Computer animation vertices sort Lageparameter
statistics generate Graph mapping Development Clusters real time Part Benchmarks mathematics processes Lecture/Conference sort
the the everybody thanks for coming and thanks for the brief intruded you get us earlier than you MIT talk little bit more about that and co-presented with my colleague Robert the we both work at his via
a small suffer for company based in Philadelphia and that is a B allele of Advanced Spatial Analysis and the way a man I conclude the Chinese translation error because the project they were working on is an international scale and I just want to underscore the scope of what we're doing with open data and how important is even going to China which is a bit of a challenge so the general trends of
feed specification uh it's relatively young less than a decade old
and yet we pretty much use it every day and time we wanna find out how to get from 1 place to another using public transit and and from what I understand it's been its early years incubating here in Portland uh
try metal I work with it along with with open plants but not bad so the the specification is very flexible it allows agencies to add data to the basic requirements and uh the weighted structured accounts for a lot of variations in the Paolo transportation systems the public transit systems are put together they're all very different and so the specification needs to account for the flexibility and it's also super normalized we saw that list of text files earlier so that all into G DFS and it's uh just a super normalize database structure it all comes in a CS these uh with text file extensions and the the data some of those text files have spatial data some of them have temporal data and then it gets even more complicated than that I have spent a lot of time working with spatial data and so assertive approach problems that when I saw this body acts no problem you know got lines on the map do that all the time
this is Boston's transit network the all the stops but those on a map points we got that and then the the temporal aspect of it you're adding realize how
complex is going to be for every stock
every any time the vehicle stops at a stop there's a role in the stop times the uh table and then that
the sequence of stops are organized into a trip so a rout has multiple trips which has multiple stops the stops there's only 1 stop that you know have all these different times of a day it started to get pretty complicated just and talking about this to to underscore the complexity and you know why some of the software they were using is is really necessary to go beyond just the the temporal aspect of the transit that systems operate in cyclical fashion so you know Sundays our Sundays for the most part Monday is on Mondays but it's semi cyclical because manufacturing holidays and construction schedules and all of that is built into the the GTF FST hopefully at and you know what we end up with is an intensely granular focus on time and space and so it's not you know it's a particular date at a particular time and this is important when you're doing trip planning the and so this is the actual North Carolina both systems for 1 day sped up really fast it's kind phonetic that I can put this together to try to understand like where your health care what was being represented here and how we could work with it going toward the end of the day that the system have down minutes my time no more service and
the other folks have done much nicer visualizations much more complicated and I have uh this is a New York City and and I think it's sped up about 10 times recorded this couple nights ago and you know so we've got the the different routs are color coded in a mold bold uh trains on the same track going in different directions and you know it's a really big that sort of new data that concept to work with the so what are we using it for
now pretty much what it was designed
for district and so on uh you google Trip Planner open trip planner and you they they take this
schedule and uh spatial data and figure out where you're starting when you're starting on a particular day and time and then by using the the graph to figure out how the to connect that are out to get you where you wanna go but what else can
we do it that there is you know this is uh this spatio-temporal model of transportation system so we can ask additional questions as you know we we are
working on this this international project where the we're building software to help transportation agencies plan and optimize the uh the system so from accessibility and performance perspectives these are the questions we want ask a lot of them can be answered with traditional GIS methods in so we we have buffers of stops for the TE verses the rail we have a you know encounter the the population that lives within a certain distance of the stops consider them serve look at subpopulations like lower-income folks and then we can also sort of look at the the frequency and weight that by the population at different times of day in different times of the week and then you ask some questions about how all the systems actually performing and then compare that to some scenarios of you know how to your ideas on on how to improve it we also look at
performance metrics like the the speed where the number of vehicles per Rao at the distance between stops the
enough time it takes to between stops and then look at which router doing well which ones are a sort of outliers or slow and uh you know that these are also the answers hopefully that could help transportation planners you make a better system the
and we have so the goes and calculates a number of different metrics and provides a new feedback to the people using it the there's 1 of these metrics is a lot more complicated and we couldn't use traditional as a GIS approaches you all of this is being
done in it it could even be done and sequel like directly that's uh some of these and I roughed up that way and to talk about that
more complicated indicator on passive over to up the thank you and yes I will talk about how were calculating travel sheds and statistics face-off of travel sheds by using some software that we critical due shells transit so once we have G that's data where we combine it with other ideas such as OpenStreetMap data and say population data to answer questions such as how many people can arrive at an area by
90 and traveling for an hour less by public transit and that might be a good indication of how well the public share trans system serving on people being able the get to a job I on time at that area
and then once I can answer that question for a range of areas and how I compare those different values with respect to
that that transit access so to begin
answering a question we need some data so if we had a population data set that perhaps roster forward there's a population count per hour at rest cell
will we give actually do is a little summary of that of that rastering summary is just I'm taking the values under his own and aggregating in them in some way by the picture you see here is a zonal summary created using Geotrellis on for another application and money into which you trusses right now but if you're interested in doing a talk on it but to modern accession to during the invited talk but in this sort summary we actually have a polygon that is drawn by the user on the map and then there's a summary of all of the zone but we don't the user draw
a map we actually want freer travel shed in inducing summary on that so a travel said I is a description of the areas that can be traveled from to buy up or will travel to or from a point in a given amount of time so what we see here is a travel shared in Philadelphia are specifically it's from leaving from the point of the marker traveling at the maximum of 1 hour at 6 30 PM on a weekday eyes so the colored area is the travel shed and there's different colored areas that represent the air travel sheds further 60 minutes 50 minutes 40 minutes and analysts and this is such a generated by a demo application of the dew strands capability at at transit that you trust I come you can go around it's interactive you can play with that come from as had we generate the Strauss sheds well we will get into our sort of the graph
theory behind actually generating the fellowship on so 1st off we need to look at the OpenStreetMap data and the duty of us data in a graph structure and is actually a time dependent weighted multi digraph which is a lot of prefixes prefixes on graphs so many cut that up and so sort explain what that means piecemeal so 1st off of swing autograph physicist graph 3 101
adequate version of of such a thing a graph
is a composition of a set of vertices or nodes and edges between those nodes so the graph we see here the 6 nodes are labeled 1 through 6 these edges between them this edge contains connecting 6 and 4 4 and 3 the a
digraph is or a directed graph is a
graph in which the edges are directional so there is an edge connecting 3 in 10 but there's no edge going from 10 to 3 there are points 3 to 10 so you can think of it sort of as a one-way street a weighted graph is a
graph in which there's a weights assigned to each of the edges of the graph so if we look at this graph and think of a as representing an
intersection and c is representing an intersection there's an edge connecting them that has the value 5 perhaps that's 5 minutes so that if you walk from a to see it would take you 5 about 5 minutes and a multi graph
is a graph which allows more than 1 edge to connect nodes so in this example we look at a vertex 5 you can see it and I say that's an intersection vertex 1 and there's 2 edges that connect these 1 with weight 6 which could perhaps mean if I walked it would take me 6 minutes and then an edge connecting i that has 3 so maybe that's like if I take the bus and then it would only take me 3 minutes but that's really depending on when the bus actually leaves and when I'm at the nodes so that's where the the time-dependent nature of a trend to graph
comes in and that's really an interesting specific specific aspect of trans a graphs that makes the countries of computations on them a little more difficult and so if I had a Main Street Station Broadway station and I had IGT 1st specify that there was a bus I had that ran from 9 AM titanium every 20 minutes and took 10 minutes
I you might ask me how long's it it's it does it take to get from main street to Broadway and I would not be able to answer your question because I don't have enough information but if you say starting from my 15 how long does it take I can actually give you an answer lies imitate the 10 minutes of the bus ride plus a 5 minute shift the wait for the 920 bus so now we have this this structure in which to view are transit system I will want create off of
it is a shortest path tree and a shortest path tree ice has a specific point a starting node on ending node and answers questions like What is the quickest way to get from a to B so there's a few types of shortest path trees there's a departure time shortest path tree is saying starting at time t work what and how long does it take to get to the other nodes in the graph and start yes starting at time t starting at this node and arrival time shortest-path tree would say I wanna get to this node
at this time so what at what points when I leave the other nodes to arrive at this this single node there's a lot of our algorithms in the graph theory that generates shortest path trees but this animation taken from wikipedia about the they just algorithm and it that this is an algorithm that can tell you how was the shortest path from a to b other inside that algorithm it actually generates a shortest path tree that tells you what's the quickest way to get from the start node to the other nodes and we can modify Dykstra's algorithm to generate shortest path trees on these time-dependent multi weighted multi digraphs I and create uh transit system shortest path
trees and this this graph her this rubber this visualization of a shortest path tree is of a transit system and the thicker lines mean the shortness of arriving at a mother a point on the transit system and so this is a visualization was actually created by graph server inside a fusion the graphs page and graphs there is an open-source mode Multimodal trip-planning engine and the last Committee on this project on that gives masters early 2011 and the reason for that is that all development so a shifted to another project
called open trip which we heard about that which is a really great but largely actively developed by transit of routing engine software I that uses openstreetmap data user gt vesper journey planning i has a batch analysis mode for doing transit network analysis like we're talking about and i has since the sophisticated graph theory and search path algorithms including a star and an opposition called contraction hierarchies so this sort of begs the question why create you Charles transit when there's already this amazing trip planning stuff open-source software package and the reason is performance but we when we were tasked with creating travel shed so we had a couple spikes that use open trip planner with that as through its rest on points and then also just as an imported library and I it just wasn't getting the performance so we needed so we created you trust transit and it gave us the performance we needed so what makes you chose transit fast
from so it's the it's the way represents the graph in memory I open planner represents its edge weights dynamically which means as it traverses the graph it calculates the edge weight based off of a state object it looks up at specific user options and has a lot conditionals based off the edge types that compute the weight but which makes it really really flexible and able to answer a wide range of by transit questions such as like
what bus 1 should I take to get from point a to point B I do show stress on the other hand it loses a lot of information and as preprocessing the pact the the weights of the edges into an index data structure that has a really really fast edge weight a lookup for incoming edges and outgoing edges so we actually have a package are graph into this state data structure that has the vertices are indexed by the base 0 to n minus 1 side that indexes into N 1 array that indexes to into another rate that really just gives information about incoming or outgoing I edge and the start time as 2nd from midnight and that the edge becomes valid the weight which is the duration in seconds of that of traversing that an edge and then the destination node so we lose we lose some flexibility in in the hat at what we can ask the trans a graph but we gain a lot of performance I'm and we can still answer a wide range of questions including generating travel sheds but so to get back to the original question you how many people can arrive at
an area by 9 AM the solution to that is to generate the arrival time travel shed for that area and then do is oral summary of that travel shed on top of that population a layer and said to answer how we how we would compare those relative areas we would generate that travels at the zonal summary statistic for each cell of a roster and then color the thereafter based on the values of this on some color ramp and pain that on a map and now we have sort of a heat map of what areas are better served by this statistic than than others and I because wrasses have a lot of cells of a lot of computation you can kind of see why we need the travels a generation to be as fast as possible but that have so In losing some information about the transgraph being able to generate the statistics like these travel shed statistics very quickly were hoping to give some people doing transit network planning or schedule optimization I'm a wide range of Bucher trouble said statistics that they could quickly iterate over for modifying the Jedi s trying out different schedules on so that they can you know make are transit systems are better serve the public that thanks the end
a the the question is the of the what so the it's in development this but if you have a data that's like part dating a continuously indicate that in and like streaming gt of us updates yeah I well so the Prony benchmarks of kind of place the the generation of the a statistic like that with a population of statistics at around of 5 minutes were optimizing correlated to get that lower blood so the streaming would probably have to be like every every 5 minutes and that's after the jet graph generation so we have to process streaming in that the graph the transgraph changes and then generating the new guys statistics from it and so it's possible that I don't think that there's been a were not a path to generate that sort of stuff in real time to actually to see a change on a map which would be awesome that but it on a big enough enough cluster of again of clusters and thanks to Hessian