Fast Travel Sheds using GTFS Data in GeoTrellis Transit
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Anzahl der Teile | 188 | |
Autor | ||
Lizenz | CC-Namensnennung 3.0 Deutschland: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/31630 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache | ||
Produzent | ||
Produktionsjahr | 2014 | |
Produktionsort | Portland, Oregon, United States of America |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
| |
Schlagwörter |
FOSS4G 2014 Portland139 / 188
7
10
14
15
16
23
24
25
28
29
33
37
39
40
43
45
46
48
50
56
64
65
69
72
74
82
89
91
98
102
107
111
114
118
128
131
132
135
138
141
143
147
149
150
157
158
161
164
165
166
173
174
175
179
185
00:00
AnalysisBitGruppenoperationProjektive EbeneZentrische StreckungSoftwareindustrieTranslation <Mathematik>BenutzerbeteiligungComputeranimation
00:50
GruppenoperationComputeranimation
01:05
DatenbankDatenstrukturSoftwareTransportproblemMAPTemporale LogikGeradeGruppenoperationMaßerweiterungPhysikalisches SystemAutomatische HandlungsplanungNormalvektorPunktVererbungshierarchieTransitionssystemUmwandlungsenthalpieMailing-ListeElektronische PublikationMultiplikationsoperatorTVD-Verfahren
02:32
DatensatzFolge <Mathematik>SoftwareBus <Informatik>Temporale LogikKomplex <Algebra>Physikalisches SystemSchedulingTabelleTeilbarkeitAutomatische HandlungsplanungRoutingTransitionssystemStoppzeitFokalpunktKonstruktor <Informatik>MultiplikationsoperatorMinkowski-MetrikDienst <Informatik>Programm/Quellcode
04:22
WellenpaketQuick-SortRoutingRichtungWeg <Topologie>Computeranimation
04:57
SchedulingAutomatische HandlungsplanungRoutingOffene MengeFigurierte ZahlVorlesung/Konferenz
05:26
SoftwareTransportproblemFrequenzGebäude <Mathematik>Temporale LogikPhysikalisches SystemProjektive EbeneAbstandGewicht <Ausgleichsrechnung>HilfesystemPuffer <Netzplantechnik>Endliche ModelltheorieDifferenteMultiplikationsoperatorComputeranimation
06:42
TransportproblemMetrisches SystemZahlenbereichQuick-SortAbstandRoutingHilfesystemAusreißer <Statistik>MultiplikationsoperatorEinsComputeranimation
07:07
RückkopplungSoftwareMetrisches SystemPhysikalisches SystemZahlenbereichDifferenteVorlesung/KonferenzBesprechung/Interview
07:29
SoftwareStatistikFlächeninhalt
08:10
IndexberechnungSpannweite <Stochastik>Prozess <Informatik>TransitionssystemMultiplikationsoperatorComputeranimationVorlesung/Konferenz
08:30
BildschirmmaskeZählenBitmap-GraphikComputeranimation
08:47
DatenstrukturGraphDeskriptive StatistikGraphentheorieGruppenoperationLokales MinimumPolygonZeitzoneQuick-SortFlächeninhaltPunktKartesische KoordinatenBitmap-GraphikSchnitt <Mathematik>MultiplikationsoperatorDemo <Programm>Computeranimation
10:46
GraphGraphentheorieVersionsverwaltungSchnittmengeKnotenmengeComputeranimation
11:07
GraphGerichteter GraphZeitrichtungPunktRichtungComputeranimation
11:23
GraphNatürliche ZahlBus <Informatik>GruppenoperationGewicht <Ausgleichsrechnung>MultigraphWorkstation <Musikinstrument>KnotenmengeNeuroinformatikComputeranimation
12:41
DatenstrukturInformationTopologieTypentheorieBus <Informatik>MultiplikationsoperatorComputeranimation
13:44
AlgorithmusAnalysisGraphSoftwareTopologieGraphentheorieProgrammbibliothekGlobale OptimierungGeradeGruppenoperationProjektive EbeneVisualisierungQuick-SortStapeldateiJackson-MethodeServerAutomatische HandlungsplanungATMPunktOffene MengeTransitionssystemHierarchische StrukturOpen SourceDijkstra-AlgorithmusDesign by ContractSoftwareentwickler
16:09
GraphHalbleiterspeicherTypentheorieKonfiguration <Informatik>AggregatzustandGruppenoperationSpannweite <Stochastik>Gewicht <Ausgleichsrechnung>Offene MengeUmwandlungsenthalpieKonditionszahlObjekt <Kategorie>Computeranimation
16:36
DatenstrukturGraphInformationAutomatische IndexierungGruppenoperationSpannweite <Stochastik>Gewicht <Ausgleichsrechnung>PunktKnotenmengeMultiplikationsoperatorZweiVorlesung/Konferenz
17:47
InformationStatistikMAPGenerator <Informatik>Globale OptimierungGruppenoperationSchedulingZeitreiseZellularer AutomatQuick-SortFlächeninhaltSpannweite <Stochastik>TransitionssystemBitmap-GraphikGraphfärbungLageparameter
19:15
GraphMathematikStatistikGenerator <Informatik>EchtzeitsystemBenchmarkGruppenoperationQuick-SortProzess <Informatik>SoftwareentwicklerVorlesung/Konferenz
Transkript: Englisch(automatisch erzeugt)
00:01
Hello everybody. Thanks for coming and thanks for the brief intro to GTFS earlier, Daniel. I'm gonna talk a little bit more about that and co-present with my colleague, Rob. We both work at Azavia, a small software company based in Philadelphia.
00:20
And at Azavia, we love advanced spatial analysis and the web. And I included the Chinese translation there because the project that we're working on is an international scale. And I just wanted to underscore the scope of what we're doing with open data and how important it is, even bringing it to China,
00:41
which is a bit of a challenge. So the general transit feed specification, it's relatively young, less than a decade old. And yet we pretty much use it every day. Anytime we wanna find how to get from one place to another using public transit. And from what I understand,
01:01
it spent its early years incubating here in Portland. TriMet did a lot of work with it, along with open plans that now convey. So the specification is very flexible. It allows agencies to add data to the basic requirements.
01:23
And the way that it's structured accounts for a lot of variations in how transportation systems public transit systems are put together. They're all very different. And so the specification needs to account for that flexibility. And it's also super normalized.
01:42
We saw that list of text files earlier that all go into GTFS. And it's just a super normalized database structure. It all comes in as CSVs with text file extensions. And the data, some of those text files have spatial data.
02:01
Some of them have temporal data. And then it gets even more complicated than that. I've spent a lot of time working with spatial data. And so I sort of approach problems that way. And when I saw this, I thought, yeah, I got this. No problem. We've got lines on the map, deal with that all the time. This is Boston's transit network.
02:22
All of the stops, put those on the map points. We got that. And then the temporal aspect of it, I didn't realize how complex it was gonna be. For every stop, any time a vehicle stops at a stop, there's a row in the stop times table.
02:41
And then the sequence of stops are organized into a trip. So a route has multiple trips, which has multiple stops. But the stops, there's only one stop. They have all these different times. It started to get pretty complicated. And I just, I'm talking about this to underscore
03:00
the complexity and why some of the software that we're using is really necessary. To go beyond just the temporal aspect of it, transit systems operate in a cyclical fashion. So Sundays are Sundays for the most part.
03:21
Mondays are Mondays. But it's semi-cyclical because then you factor in holidays and construction schedules. And all of that is built into the GTFS data, hopefully. And what we end up with is an intensely granular focus on time and space.
03:42
And so it's not, it's a particular date at a particular time. And this is important when you're doing trip planning. And so this is a Asheville, North Carolina bus systems for one day sped up really fast. It's kind of frenetic.
04:00
And I kind of, I put this together to try to understand like where, you know, how, what was being represented here and how we could work with it. And it's getting towards the end of the day. The, you see the system slows down and then it's nighttime and no more service.
04:24
And other folks have done much nicer visualizations, much more complicated than I have. This is New York city. And I think it's sped up about 10 times. I recorded this a couple of nights ago. And, you know, so we've got the different routes that are color coded, you know, multiple trains
04:43
on the same track going in different directions. And, you know, it's a really big sort of, you know, data concept to work with. So what are we using it for now? Pretty much what it was designed for,
05:02
which is trip planning. And so, you know, Google trip planner, open trip planner, you know, they take this schedule and spatial data and figure out where you're starting, when you're starting on a particular day and time, and then using the graph, figure out how to connect
05:24
that route to get you to where you want to go. But what else can we do with it? There's, you know, this is a spatio temporal model of a transportation system. So we can ask additional questions. You know, we are working on this international project
05:44
where we're building software to help transportation agencies plan and optimize their systems. So from accessibility and performance perspectives, these are the questions we want to ask. And a lot of them can be answered
06:01
with traditional GIS methods. You know, so we have buffers of stops for the T versus the rail. We have, you know, we can count the population that lives within a certain distance of the stops, consider them served, look at subpopulations like lower income folks. And then we can also sort of look at the frequency
06:26
and weight that by the population at different times of day and different times of the week. And, you know, ask some questions about how the system's actually performing and then compare that to some scenarios of, you know, how to, you know, ideas on how to improve it.
06:44
We can also look at performance metrics like the speed or the number of vehicles per route, the distance between stops, the amount of time it takes to between stops, and then look at which routes are doing well, which ones are sort of outliers or slow.
07:01
And, you know, these are also answers hopefully that could help transportation planners, you know, make a better system. But we have, so the software goes and calculates a number of different metrics and then provides, you know, feedback to the people using it. The, there's one of these metrics
07:22
is a lot more complicated and we couldn't use traditional GIS approaches. You know, a lot of this is being done in, it could even be done in SQL, like directly in PostGIS. That's, some of these I roughed out that way. And to talk about that more complicated indicator,
07:42
I'm gonna pass it over to Rob. Thank you. Yeah, so I wanna talk about how we're calculating travel sheds and statistics based off of travel sheds using some software that we created called GeoShalls Transit. So once we have GTFS data,
08:01
we might wanna combine it with other data such as OpenStreetMap data and say population data to answer questions such as how many people can arrive at an area by 9 a.m. traveling for an hour or less by public transit? And that might be a good indication of how well the public transit system is serving. People being able to get to a job on time at that area.
08:25
And then once I can answer that question for a range of areas, how do I compare those different values with respect to that transit access? So to begin answering that question, we need some data. So if we had a population data set,
08:41
perhaps in a raster form where there's a population count per raster cell, well, we can eventually do a zonal summary of that raster. And what a zonal summary is is just taking the values under a zone and aggregating them in some way. The picture you see here
09:00
is a zonal summary created using GeoTrailis for another application. And I'm not gonna go into what GeoTrailis is right now, but if you're interested, I'm doing a talk on it tomorrow during session two during the invited talk. But in this zonal summary, we actually have a polygon that is drawn by the user on the map, and then there's a summary of the zone.
09:23
But we don't want the user to draw a map, we actually wanna create a travel shed and then do a zonal summary on that. So a travel shed is a description of the areas that can be traveled from and to, or traveled to or from a point in a given amount of time.
09:40
So what we see here is a travel shed in Philadelphia, specifically, it's from leaving from the point of the marker, traveling a maximum of one hour at 6.30 p.m. on a weekday. So the colored area is the travel shed, and there's different colored areas that represent the travel sheds for 60 minutes,
10:03
50 minutes, 40 minutes, and less. This was actually generated by a demo application of the GeoTrailis transit capability at transit.geotrails.com. You can go around, it's interactive, you can play with it, it's kind of fun. So how do we generate these travel sheds? Well, we will get into sort of the graph theory
10:24
behind actually generating the travel shed. So first off, we need to look at the OpenStreetMap data and the GTFS data in a graph structure. And it's actually a time-dependent weighted multi-diagraph, which is a lot of prefixes on graphs. So I'm gonna cut that up
10:41
and sort of explain what that means piecemeal. So first off, I'll explain what a graph is. This is graph theory 101, the quickest version of such a thing. A graph is a composition of a set of vertices or nodes and edges between those nodes. So the graph we see here, there's six nodes that are labeled one through six.
11:02
There's edges between them. There's an edge connecting six and four, four and three. What a digraph is, or a directed graph, is a graph in which the edges are directional. So there is an edge connecting three and 10, but there's no edge going from 10 to three. The arrow points three to 10.
11:21
So you can think of it sort of as a one-way street. A weighted graph is a graph in which there's weights assigned to each of the edges of the graph. So if we look at this graph and think of A as representing an intersection and C is representing an intersection, there's an edge connecting them that has a value five.
11:40
Perhaps that's five minutes. So if you walk from A to C, it would take you five minutes. And then a multigraph is a graph which allows more than one edge to connect nodes. So in this example, if we look at vertex five, you can see it and say that's an intersection, vertex one. There's two edges that connect. There's one with weight six, which could perhaps mean
12:03
if I walked, it would take me six minutes. And then an edge connecting that has three. So maybe that's, if I take the bus, then it would only take me three minutes. But that's really dependent on when the bus actually leaves and when I'm at the node. So that's where the time-dependent nature
12:21
of a transit graph comes in. And that's really an interesting specific aspect of transit graphs that makes the computations on them a little more difficult. So if I had a Main Street station and a Broadway station, and I had GTFS specify that there was a bus that ran from 9 a.m. to 10 a.m. every 20 minutes
12:43
and took 10 minutes, you might ask me how long does it take to get from Main Street to Broadway? And I would not be able to answer your question because I don't have enough information. But if you say starting from 915, how long does it take? I can actually give you an answer. It's gonna take the 10 minutes of the bus ride
13:01
plus the five minutes you have to wait for the 920 bus. So now that we have this structure in which to view our transit system, what we'll wanna create off of it is a shortest path tree. And a shortest path tree has a specific point, a starting node, or an ending node, and answers questions like,
13:21
what is the quickest way to get from A to B? So there's two types of shortest path trees. There's departure time shortest path trees, saying starting at time T, how long does it take to get to the other nodes in the graph, starting at time T and starting at this node? And arrival time shortest path tree would say,
13:43
I wanna get to this node at this time, so at what points would I leave the other nodes to arrive at this single node? And there's a lot of algorithms in graph theory that generate shortest path trees. This is an animation taken from Wikipedia
14:01
about Dijkstra's algorithm. And this is an algorithm that can tell you what's the shortest path from A to B, but inside that algorithm it actually generates a shortest path tree that tells you what's the quickest way to get from the start node to the other nodes. And we can modify Dijkstra's algorithm to generate shortest path trees
14:22
on these time-dependent weighted multi-die graphs and create transit system shortest path trees. And this visualization of a shortest path tree is of a transit system, and the thicker lines mean the shortness of arriving at a point on the transit system.
14:44
So this is a visualization that was actually created by Graph Server, and it's featured on the Graph Server page. And Graph Server is an open source multi-modal trip planning engine. The last commit on this project, on the GitHub's master, is early 2011. And the reason for that is that all the developments
15:02
sort of shifted to another project called Open Trip Planner, which we heard about, which is a really great, largely actively developed transit routing engine software that uses OpenStreetMap data, uses GTFS for journey planning.
15:21
It has a batch analysis mode for doing transit network analysis like we're talking about and has some sophisticated graph theory shortest path algorithms, including A star and an optimization called contraction hierarchies. So this sort of begs the question, why create GeoTrails Transit when there's already this amazing trip planning's open source software package?
15:43
And the reason is performance. When we were tasked with creating travel sheds, we had a couple spikes that use Open Trip Planner through its RESTful endpoints, and then also just as an imported library. And it just wasn't getting the performance that we needed.
16:02
So we created GeoTrails Transit and it gave us the performance we needed. So what makes GeoTrails Transit fast? So it's the way it represents the graph in memory. Open Trip Planner represents its edge weights dynamically, which means as it traverses the graph,
16:20
it calculates the edge weight based off of a state object. It looks up specific user options and has a lot of conditionals based off the edge type that compute the weight, which makes it really, really flexible and able to answer a wide range of transit questions such as like, what bus line should I take to get from point A to point B?
16:41
GeoTrails Transit, on the other hand, loses a lot of that information and does pre-processing to pack the weights of the edges into an index data structure that has a really, really fast edge weight lookup for incoming edges and outgoing edges. So we actually packaged our graph into this tight data structure that has the vertices indexed base zero to N minus one
17:07
that indexes into one array, that indexes into another array, that really just gives information about incoming or outgoing edge and the start time as seconds from midnight that the edge becomes valid,
17:21
the weight, which is the duration in seconds of traversing that edge, and then the destination node. So we lose some flexibility in what we can ask the transit graph, but we gain a lot of performance. And we can still answer a wide range of questions,
17:41
including generating travel sheds. So to get back to the original question, how many people can arrive at an area by 9 a.m., the solution to that is to generate the arrival time travel shed for that area and then do a zonal summary of that travel shed on top of that population layer.
18:01
And to answer how we would compare those relative areas, we would generate that travel shed zonal summary statistic for each cell of a raster and then color the raster based on the values, based on some color ramp, and then paint that on a map. And now we have sort of a heat map
18:21
of what areas are better served by this statistic than others. And because rasters have a lot of cells, it's a lot of computation, you can kind of see why we need the travel shed generation to be as fast as possible. So in losing some information about the transit graph,
18:43
being able to generate these statistics like these travel shed statistics, very quickly we're hoping to give people doing transit network planning or schedule optimization a wide range of travel shed statistics that they could quickly iterate over if they're modifying the GTFS
19:01
and trying out different schedules so that they can make our transit systems better serve the public. That's it, thanks. Any questions?
19:26
Punchline. It's in development. What if you have data that's updating continuously?
19:42
Continuously, can you take that in? Like streaming GTFS updates? Well, so the preliminary benchmarks have kind of placed the generation of a statistic like that with the population statistic. Around five minutes,
20:01
we're optimizing currently to get that lower. But so the streaming would probably have to be like every five minutes. And that's after the graph generation. So there'd have to be a process of streaming the transit graph changes and then generating the new statistics from it. So it's possible, but I don't think that there's,
20:22
we're not on a path to generate that sort of stuff in real time to actually just see it change on a map, which would be awesome. But maybe on a big enough cluster, a big enough cluster.
20:43
All right, thanks.