We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Public Transport Data in KDE Itinerary

00:00

Formal Metadata

Title
Public Transport Data in KDE Itinerary
Subtitle
Querying realtime journey data and dissecting ticket barcodes
Title of Series
Number of Parts
542
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
KDE's digital travel assistance app Itinerary consumes public transport data in various ways, from journey queries over realtime disruption information and coach layouts to tickets. In this talk we'll look at what has been implemented for this and what is still missing. KDE Itinerary supports you on the road by presenting all relevant travel dates and documents in a timeline, with all its content being automatically extracted from flight, train and bus tickets, hotel reservations or event tickets. Unlike with proprietary alternatives, all of this happens on the user's device and under the user's control. Itinerary can then augment this with realtime information about disruptions and suggest public transport options to get from the station to the hotel for example. For connections in complex train stations Itinerary also provides OSM-based indoor maps including the realtime operational status of elevators where available. In order to support the travel data extraction from tickets, decoding several standard and proprietary ticket barcode formats has been implemented. Besides UIC 918.3 and ERA SSB/TLB this as of recently also includes the rather complex new European international ticket standard "Flexible Content Barcode" (FCB). For a few proprietary barcodes we are still struggling with reverse engineering and/or finding the corresponding documentation though. For querying public transport journeys and disruption information several open and proprietary backends are supported, such as Navitia, OpenTripPlanner, OpenJourneyPlanner/TRIAS, Hafas and EFA. The focus here is on unified access to common information rather than to support every possible detail and journey customization option. One still missing but particularly difficult to model piece of information however are prices and tariffs. With over 80 currently supported online services, automating discovering and managing information about those also becomes relevant. We will look at the Transport API Repository project as well as Itinerary's approach of adding line and product metadata from OSM and Wikidata for this. Realtime data of the operational status of elevators and escalators or the train coach layout at a given platform are also supported, but unfortunately not as widely available. This tends to be particularly relevant information for users with mobility restrictions though.
Codierung <Programmierung>CASE <Informatik>InformationResultantBitComputer virusComputer animation
Mountain passComputing platformMIDILatent heatScripting languageInternet service providerStandard deviationImplementationSpontaneous symmetry breakingTLB <Informatik>Key (cryptography)Closed setExtension (kinesiology)Operator (mathematics)Reverse engineeringComputer configurationSample (statistics)Compact spaceCodierung <Programmierung>File formatFormal grammarVariety (linguistics)AbstractionProduct (business)Focus (optics)Query languageType theoryOrder (biology)Open sourceFront and back endsUniform resource locatorPublic-key cryptographyInformationArithmetic meanExtension (kinesiology)Wave packetInternetworkingWorkstation <Musikinstrument>Probability density functionField (computer science)Real-time operating systemRight angleCASE <Informatik>Message passingServer (computing)Set (mathematics)Level (video gaming)Sampling (statistics)Operator (mathematics)1 (number)IterationLine (geometry)Latent heatRepresentation (politics)Bus (computing)Exterior algebraEvent horizonService (economics)BitAbstractionComplex (psychology)String (computer science)Reverse engineeringBuildingPersonal digital assistantMobile appStandard deviationComputer configurationContent (media)AreaComputing platformKey (cryptography)Electronic signatureCodePerfect groupSelf-organizationPhysical systemDiscrete element methodForm (programming)Point (geometry)Information privacyTouchscreenBucklingSource code2 (number)Web pageGroup actionSoftware developerResultantMultiplication signMathematicsConnected spaceMetropolitan area networkVirtual machineLaceCodeProcess (computing)Streaming mediaPhysical lawEllipseWebsiteComputer animation
Variety (linguistics)AbstractionProduct (business)Focus (optics)Type theoryQuery languageOpen setCommunications protocolRepository (publishing)Operator (mathematics)AreaParameter (computer programming)Computer iconLine (geometry)Asynchronous Transfer ModeMomentumRange (statistics)Point cloudFront and back endsComputing platformNormed vector spaceMetropolitan area networkElectric currentGEDCOMService (economics)Different (Kate Ryan album)Open sourceFront and back endsInformationParameter (computer programming)CollaborationismMappingMereologyComputing platformWave packetReal-time operating systemWeb pageGreen's functionSource codeLevel (video gaming)Query languageGraph coloringSimilarity (geometry)Range (statistics)Standard deviationCommunications protocolException handlingDistanceRight angleData modelAreaMobile appOrder (biology)BitPhysical systemWorkstation <Musikinstrument>RoutingTheory of relativityKey (cryptography)CuboidProper mapOcean current1 (number)Field (computer science)Point cloudMetreSet (mathematics)Sheaf (mathematics)Term (mathematics)Local ringCategory of beingLine (geometry)Open setProduct (business)Interior (topology)Virtual machineSoftwareCASE <Informatik>Cellular automatonBridging (networking)WhiteboardCovering spaceAverageArmInheritance (object-oriented programming)Form (programming)Texture mappingHydraulic motorFamilyCombinational logicMultiplication signObservational studyMetropolitan area networkLatent heatPlanningComputer animation
Electric currentSet (mathematics)CodeFile formatLattice (order)BuildingLocal ringHybrid computerCASE <Informatik>Insertion lossDistanceMetadataMobile appProduct (business)Standard deviationAsynchronous Transfer ModeComputer configurationSoftwareLibrary (computing)EmailRegulator genePhysical systemDoubling the cubeMereologyPosition operatorService (economics)Operator (mathematics)Touch typingCommutatorPlastikkarteUniform resource locatorModal logicLevel (video gaming)InformationWave packetFood energyDevice driverElectronic mailing listRoutingOffice suite40 (number)Observational studyPoint (geometry)SpacetimeTwitterResultantForm (programming)Right angleOperational amplifierVotingAbstractionPresentation of a groupProjective planeFormal languageSpecial unitary groupComputer animation
Program flowchart
Transcript: English(auto-generated)
Okay, please have a seat, we have to begin. Do as you can.
All right, hi everybody again. We meet Volker Koze, he will explain us what he's doing on Caddy-E and he's come from Germany and we're very pleased to welcome him. Go on. Thank you.
Okay, so hello. Yeah, I'll talk a bit about how we use public transport information in Caddy itinerary. So what is this? Caddy-E is a big open source community, so not a transport operator for once here.
We do all kinds of stuff. You can find us in the EKE building on the second level to look at a few things we do. And one of the things we do is a transport assistance app called itinerary.
So in that you can import any kind of travel related things like flights, train trips, bus trips, hotel reservations, event tickets, etc. And that is then grouped together and put into a timeline so you have all the relevant information at hand when you need them.
And we augment that with whatever might be helpful along the way, like the weather forecast as the obvious example. Since we don't really have a lot of time, I'll have to dive right into what we do with public transport data. You'll see some of the features along the way then.
So the first problem is we need to actually understand where you want to go. Ideally without you having to enter that manually, but by reusing documents or material you already have. In the best case scenario, that material has machine readable annotations about your
trip. There's something that Gmail has been promoting, but outside of airlines, I think in Europe at least, we have only seen that for Flix bus and train line. So none of the major railway operators, for example, have that.
But there is a second best thing, and that is the ticket barcodes. Most, not all of them, but luckily most contain some information about the trip and especially in international use, they are somewhat standardized. So we actually have a chance to understand what's in them.
The probably most well-known one is the one from airline boarding passes. That is a single standard that works globally. So that is the absolute best case scenario. Only one thing we have to implement. For railways, we don't have that luxury.
But the European railway agency has at least defined a few standards that are in use in Europe for international travel and in some countries also domestically. The complexity of those standards varies greatly. The airline boarding passes, for example,
that is a simple ASCII string that is almost human readable. That's as easy as it gets. The latest iteration from the European railway agency for the international tickets here, the flexible content barcode, that is 2000 lines of ASN1 specification
defining 300 or so mostly optional fields with some unaligned packed encoding representation. So awesome to debug, but extremely powerful. That's the ultimate other end of complexity then.
Just because it is standardized doesn't automatically mean this is also all openly available. Again, the European railway agency is the good example here. They have that on the website. If something is missing, you ask them, they put it on the website. Perfect. Some of the other organizations ask you for unreasonable amounts of money to get a PDF
or require you to be a member. And for that, you need to be an airline or railway agency, which we are not. Some of those systems have cryptographic signatures, which we usually don't care about
because we only care where you travel, not if the ticket is actually valid. But in one case, the FauDee Fau-E ticket used in some areas in Germany and Luxembourg, the signature and the content is somewhat intermixed, so we actually need to decode that.
And just because something is called a public key doesn't mean it's actually public on the website. In this case, we got lucky. Extensive internet search found a 100-page PDF in a location that probably shouldn't have been containing a screenshot where we found an URL pointing to an LDAP server on which we found the keys.
So, it can be quite messy to work with this stuff. Most of the standards have operator-specific extensions. Those, of course, are not documented. For the final point, is there anyone from Trenitalia here?
Too bad I have questions for them. Then, of course, there's also a set of proprietary codes where our only option is reverse engineering. For that, we rely on donations of sample tickets because, I mean,
everything we do is very much focused on privacy. So, once on your own device, we never get your actual tickets. So, we need them donated, right, to work with them. There were ones listed here.
For those, we have more or less understanding. Some, we get enough out of it to work already. For some, we can barely prove that there is actually travel-relevant data in there, but we have no way of decoding that. For me, the most frustrating one is SBB, because that is a fairly comprehensive format.
We understand most of it, apart from the daytime fields. And, without that, it is pretty much useless, right? So, if there's anyone here from SBP who has hints or information on how those tickets work, I would be very interested. Then, once we actually know where you're going,
and we have that in the timeline, we augment that with real-time public transport information. The most obvious example is delays and disruptions, cancellations, platform changes, that kind of stuff, right?
So, we notify you about that. Another thing we do is filling gaps in the itinerary, right? So, to get here, I booked a train from Berlin to Brussels, but I actually need to go from my home to the station, then take the train, and then, in Brussels,
somehow get from the station to my hotel with using the respective local public transport. So, that is something we can fill in automatically. And then, the third thing is when you miss a connection, right, we offer you to find alternatives for getting to the same destination.
In order to implement that kind of stuff, we kind of need to get to that data, and there is, unfortunately, not a single global service that gives us to us, right? So, we need to query many, many different sources,
depending on where we currently are, which backend can actually provide us this information. So, we have a bit of an abstraction layer. Over all those sources, which basically offers three basic operations, searching for locations
by name or coordinate, searching for arrival and departures at a specific stop, and searching for journeys from A to B. And on top of that, we then build the higher-level features. In terms of supported backends, that is basically three different categories.
The fully open-source ones, those are the easiest ones to work with, like Navisia, OpenTrip Planner. Motus is still missing on that list, simply because there is currently no production deployment we have access to.
As soon as there is one, we'll add that as well. Second category is things where the protocol is at least documented, like the OpenJourney Planner used in Switzerland. And the third one, the most annoying ones to work with, is the proprietary legacy backends.
But just having the protocols, of course, is not enough. We also need to know where exactly are those the respective services for that. For that, there is the Transport API, which is three.
That's a collaboration with others having that same problem, like Janice. And that is basically a collection of machine-readable information about those services, both where exactly do I need to connect there, which protocol do they use, specific parameters I need to use,
but also information like the coverage area. Because for most of those services, that is kind of implied. If I have the Belgian transport app, the scope of that is implicit.
Navisia is the exception that actually has API for querying this. But if I want to pick the right backend, I, of course, need that information. Very similar problem. All of what you see here is what journey query would describe as metro line one.
But the signage is very, very different depending on where you are. And the signage is something that is very prominent locally. So if I should show the right thing in the app in order to help the user to find the right thing.
But this isn't really unique, right? So finding the right logo is somewhat tricky. What we do there is we get the logo and the colors and all of that information from Wikidata. The Wikidata entry is linked to an OpenStreetMap route relation.
From that, we get the geographic bounding box. And the combination of geographic bounding area, name and mode of transport is mostly unique. And that is then good enough to find the right logos. Okay, then a few more things we integrate.
One is available rental vehicles. So rental bikes, electric kick scooters, that kind of stuff. What you maybe can see in the screenshot here is a few available kick scooters, some shown in green, some shown in yellow.
The yellow ones are those with a remaining range of less than five kilometers. All of this is coming from GBFS. That is a nicely developing open standard for that kind of information. And it is very actively evolving.
Just one or two years ago, we wouldn't have that level of detail available. So that's a very nice example of open standards and open source in that field. Coverage for that is somewhat biased towards Europe and North America though.
I know that those systems exist in Asia as well, but I have no idea if they use GBFS as well or if there's any other systems. So again, something where I would be interested in information. Another thing we integrate on the train station maps
is the real-time status of elevators and escalators. So I think in this case, they're all shown in green. So they are actually functional. This is, of course, something very relevant if you're traveling, say, with heavy luggage,
a stroller, or in a wheelchair. The data source for that is Accessibility Cloud. That is the backend behind realmap.org. That's also free software, and they aggregate these kinds of information from many different sources.
There is a bit of a coverage bias towards Germany. So similar data from other countries would be more than welcome. Another thing where we have a coverage problem
is train coach layouts. I think there's currently two or three countries where we are getting this. Still is widely different data models. So it's not quite clear yet how we best abstract that. And that is also somewhat relevant,
especially on the long distance trains, which can get up to 400 meters. So you want to know where exactly you need to go on a platform, especially if you're in a hurry. One challenge there is that, especially in the countries
where we have that, OpenStreetMap doesn't contain many of the platform section informations. And that is the key to match those two data sets together to have the proper train layout displayed correctly on the actual station map.
If you think further towards indoor navigation in a train station, that is kind of relevant. Pushing this topic even further would be to also show insights of the train. At least Deutsche Bahn has very detailed PDFs for human consumption of the interior.
But there is currently, to my knowledge, no machine readable format, say like OSM for trains. And that is, again, relevant for accessibility, for example. So I need to know which parts I can go to and which parts I can't go to.
And then the last part that is very, very recent, a lot of work on that happened just yesterday, is using the onboard APIs on trains. So if you connect to the onboard Wi-Fi, there is often some kind of portal page
showing you information about the current trip. That's powered by some API that we can use as well. And typically, this gives you current position, speed, and heading, and information about the journey with delays on each stop.
Just showing that is, of course, the easiest way to integrate that. But the real value comes when we use that for higher level features again. For example, checking if you're on the right train. It might seem obvious, but if you're traveling in a country where you don't speak the local language,
or in case of a multi-set train that splits up along the way, Zugteilungen Ham, as we say in German, it's quite helpful if the software double checks that. Same for detecting if we have arrived yet.
That is something very, very easy to realize for the human, but it's actually surprisingly tricky for the software to know. Yeah, so all of these things I've shown you are not tied to the app specifically, but are available as reusable libraries.
And for example, Nextcloud is using the ticket data extraction in their email client, so you can automatically add calendar entries for your ticket when you get them by email. And I think there is much more
that can be built on top of all this. The itinerary app is basically for the irregular, explicitly booked kind of travel, but doesn't touch the commute use case at all. If you happen to know about any kind of relevant APIs or datasets, or have the documentation for those,
or for ticket formats, we would be very, very much interested. Same if you have travel documents past, present, or future that you are willing to donate to develop the extractor on that, we are happy to take those as well.
Yeah, thank you. You're talking about getting live train data from the train, from the vehicle you're traveling.
Can you do it from the location of the phone, like you just look at the GPS, and then wait a second or two, look again, and see where you're going and match it up with the... Right, the position information we get on the train is essentially GPS, just visit GPS receiver on the train.
In theory, you could do that from the phone. The problem is that reception inside a metal train is somewhat limited, so you usually get better results. By using the API for that. But it is essentially GPS data you get there, so it's the same.
Yeah? Yeah, so how do you handle sometimes, like, if there are conflicting labels or certain modalities, I mentioned before that you've got the icons, you've got the different transport, and I think in Germany, like Stuttgart, for example, like, some services return the modalities like Tran, but all the drivers are somewhere,
you're on your list. Yeah, that is an annoyingly complicated topic. The modality is awfully undefined. I mean, there's neither a technical nor, like, a product level definition on what is a subway
or a metro or a tram, and it can be all kinds of hybrid things. Um, that is one of the metadata we carry from Wikidata alongside the logos and so on, so if in doubt, we use that. But even that is, there is some loss in there, yeah.
I mean, there's some cities where you have trams that go on long distance railway outside of the city. And yeah, I mean, we will never be able to capture these extreme special cases that a region or operator-specific app can capture.
So, I mean, that is the price we pay for that abstraction, right, and the one app that works everywhere approach. One question, you showed the data about the scooters and how it has, you said it's getting better
and the coverage is better. What is driving this improvement? Is that regulation or why it's getting better? That is a good question. I don't know for sure. I know that in some cities it is regulation. So if you want a permit to operate your rental system
in that city, right, you are required to publish your feeds as GBFS and we then happily consume that. And I think another part is that somehow started very early
by some US cities requiring that to give out the permits for those systems and then that kind of became the standard mode of operation for those services, right? So if you get in very early, that works. I don't think there is like national or UI regulation.
So this is usually something that differs from city to city. Regarding on-demand traffic, some of the routing engines
have that in their results. So we can show that, but we currently have nothing regarding actively booking things on demand or otherwise. Because that is something where there is practically
no API available for external users. I don't think the railway operators, or especially even worse with the private operators, they give that to smaller users like us if they give it to anyone at all.
And you mentioned commuting. What about the case when I don't yet have a ticket, but I want to make the journey? So for instance, in Germany, I had a bank card 100. Is there a possibility already to enter that somehow? We also have a general route search. So you just specify where you want to go
and it offers you, depending on where you start, the options from Deutsche Bahn or SNCF or wherever you are. And then you can add that to the timeline as well. So there is the ability to do manual entry for that scenario. But that would be quite cumbersome to do this
every day for your commute. So there you would want something that I know I usually go to the office between 8 and 9 in the morning. So inform me if there's any deviation on my usual route, but not necessarily make me enter this.
How's the checking done? Is it a push from the server? Or do you do this locally in the app? So because I think if the app is continuously doing the checks in the background, then is it energy draining, like battery draining? You mean the checks for delays? Yeah, that is polling. There is none of those services we use
has a push service that we could use. OK.