We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Use of FOSS4G Technologies in the Management of Railway Infrastructure Data

00:00

Formal Metadata

Title
Use of FOSS4G Technologies in the Management of Railway Infrastructure Data
Title of Series
Number of Parts
156
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Railways has always been looked at as the best public transport option since its invention. Even a single freight railway trip along with all the surrounding railway environment produces huge amount of data like the routing data, train schedules, on-board sensor data, wayside field unit data, etc. Such data are normally temporally and spatially referenced. This data helps in correct routing of trains, maintaining and monitoring the condition of the infrastructure, to expand the existing infrastructure and many more purposes. The use of free and open source geospatial software is greatly helping us with the management and processing of these datasets. With digitalization and rise of Internet of Things (IoT) that is based on a sensor ecosystem, we are looking at data that is generated at very high rate and is crucial for analysis both in short and long terms. The background digital infrastructure that handles such data should be state-of-the-art, fault-tolerant, scalable and easy-to-operate. This talk explains how we use FOSS4G technologies to build our digital infrastructure platform. We at Institute of Transportation Systems (TS) of the German Aerospace Center (DLR) started with this idea in mind and developed an infrastructure platform called Transportation Infrastructure Data Platform (TRIDAP). It is provisionally operational and is being further developed . DLR-TS conducts research into technologies for the intermodal, connected and automated transport of the future on road and rail. Research into new systems in rail and road transport domain requires digital twins. The digital twin structure helps to draw a holistic picture of the infrastructure of road and rail in connection with the vehicles, people and goods moving within the infrastructure. This is realized using distributed system architectures and artificial-intelligence methods. The TRIDAP platform is developed using various FOSS4G technologies. This platform is capable of making the data available to researchers within the DLR as well as project partners and other stakeholders over a long period of time for analysis and visualization. The platform development is a part of the DLR-funded cross-domain project called "Digitaler Atlas 2.0". The datasets handled in TRIDAP vary to a great degree in terms of their size, nature and format (numerical sensor measurements, images from visual sensors, streams of data from a single geo-location and many such other variations). TRIDAP has storage feature of these types of structured datasets in a PostGIS database or the non-structured data in file-folders. A mammoth data model is developed to accommodate different datasets in databases, along with a possibility to track changes. Also, provision is made to store non-structured data in a hierarchy of storage space using a NetApp base. TRIDAP supports the analysis and sharing of georeferenced as well as non-georeferenced datasets. For condition monitoring applications, information on changes in the railway infrastructure and management activities (such as repair and improvement of existing infrastructure) carried out in the past, is also stored in the platform. In order to make these datasets Findable, Accessible, Interoperable and Reusable (FAIR), the system stores sufficient metadata as well as supports the publication of datasets through the use of open-source software GeoServer and GeoNetwork. Most of the data are georeferenced and are stored in a common space and time reference frame - World Geodetic System / WGS84 and Coordinated Universal Time (UTC). The platform contains instances of various big data open-source software, such as Apache Kafka, Apache Flink, Apache Spark, to process and analyze the data through the development of stream and batch processing applications. To carry out fusion of measurement and weather datasets, we are currently developing a python-based tool to download data from Deutscher Wetterdienst (DWD) for a user-defined region and time period directly into the data processing application. Weather data from other internal and external sources are planned to be integrated in the future. In order to provide a high-quality service to the researchers at DLR-TS as well as our project partners, it is inevitable to ensure high availability and optimal performance of the platform. To achieve this, we are integrating all components of TRIDAP into a monitoring framework that uses a monitoring tool called Prometheus and a visualization tool called Grafana. TRIDAP also has a python-based tool in development to validate the data being stored in the system. For this purpose, we define a set of validation rules together with the team of researchers / data owners / data generators. he validation tool deals with dynamic live data received from railway locomotives and wagons in the field and infrastructure data stored in databases. When validation errors are identified, the team of data owners and generators are immediately informed, in order to take further actions. The geo-datasets stored in TRIDAP are shared with stakeholders in standardized data formats through the use of GeoServers. GeoNetwork is being used to setup a geodata catalog that enables easy search and access to datasets stored in the platform. The GeoNetwork uses metadata standards such as Dublin Core and ISO/TS 19139 to document metadata. It is also planned to connect GeoNetwork with the research data repository (FDR) of the DLR to obtain a persistent ID (PID) for the datasets on-demand. Certain datasets stored in the platform are confidential and have restricted access. This is currently being implemented through the definition of multiple users, roles and data security rules in the GeoServer as well as in the data storage layers.
Keywords
127
SoftwareFatou-MengeSystem programmingDifferent (Kate Ryan album)Computing platformMagnetic-core memorySoftwareData managementPhysical systemLecture/ConferenceComputer animation
Wide area networkCustomer relationship managementData storage deviceMetadataInformation retrievalSoftwareTopologySynchronizationElectric currentMeasurementMultiplicationMaß <Mathematik>Uniform resource locatorOpen sourceSample (statistics)Computer networkComputer-generated imageryElement (mathematics)Transportation theory (mathematics)Computing platformDatabaseInterface (computing)CASE <Informatik>SoftwarePoint (geometry)Data storage deviceComputing platformStructural loadLevel (video gaming)Element (mathematics)DatabaseDifferent (Kate Ryan album)Medical imagingBelegleserInformation retrievalArithmetic meanGroup actionProjective planeNetwork topologyTemplate (C++)Right angleExecution unitResultantLine (geometry)MetadataBridging (networking)Software maintenanceDivisorShared memoryCondition numberTransportation theory (mathematics)Visualization (computer graphics)Computer animation
Transportation theory (mathematics)SoftwareComputing platformData storage deviceMultiplicationDatabaseInterface (computing)Data storage deviceServer (computing)Client (computing)Menu (computing)Cluster samplingKey (cryptography)Service (economics)Repository (publishing)Fluid staticsSymbolic dynamicsSet (mathematics)Web applicationAerodynamicsPhase transitionÜberlastkontrolleSign (mathematics)BuildingData storage deviceDifferent (Kate Ryan album)Repository (publishing)Level (video gaming)Data storage deviceCluster analysisMultilaterationMagnetic-core memoryTerm (mathematics)CodeData managementCartesian coordinate systemTrajectoryDomain nameWrapper (data mining)Web applicationService (economics)Line (geometry)Standard deviationBuildingFile formatDatabaseLibrary catalogInterface (computing)Open sourceVisualization (computer graphics)Streaming mediaProjective planeComputer fileSlide ruleMusical ensembleGeometryNeuroinformatikReference dataMobile appServer (computing)Fluid staticsSoftwareWeightX-ray computed tomographySymbolic dynamicsDynamical systemMultiplication signComputer animation
Library catalogInterface (computing)Data analysisVisualization (computer graphics)Software developerPreprocessorData storage deviceInstance (computer science)CollaborationismDiagramBackupFile archiverDiallyl disulfideProcedural programmingPhase transitionMagnetic-core memoryData managementSoftware developerGroup actionComputing platformDatabaseCollaborationismGeometryComputer animation
MeasurementComputerPrototypeService (economics)DatabaseDirected setCodeSample (statistics)LoginData storage deviceMetadataVideo trackingMathematicsInformationComputer networkLink (knot theory)Principal ideal domainVirtual machineExistenceControl flowAnalog-to-digital converterStandard deviationOpen setDigital electronicsData storage deviceData conversionCustomer relationship managementFatou-MengeOpen sourceTime domainMultiplicationContinuous functionSoftwareWeb pagePoint cloudDifferent (Kate Ryan album)Open setGeometryServer (computing)Data storage deviceInstance (computer science)Data conversionMetadataDatabaseMereologyMobile appData storage deviceWeightExtension (kinesiology)InformationLibrary catalogBridging (networking)Multiplication signMatrix (mathematics)Direction (geometry)CollaborationismPoint cloudDiagramSpacetimeFunctional (mathematics)SoftwareLevel (video gaming)File viewerStandard deviationSingle-precision floating-point formatService (economics)Plug-in (computing)Data modelCurveMagnetic-core memoryAttribute grammarDomain nameProjective planeGraph coloringReal numberForcing (mathematics)Computing platformElement (mathematics)Revision controlFile formatPresentation of a groupSoftware maintenanceDot productRaw image formatGame controller3 (number)Computer fileData structurePredictabilityComputer animation
Transportation theory (mathematics)System programmingCurvatureCustomer relationship managementStatuteComputer-generated imageryString (computer science)Special unitary groupComputer-assisted translationResonatorExterior algebraOpen sourceProjective planeExtension (kinesiology)SubsetMultiplication signOpen setBlogPresentation of a groupLibrary catalogStreaming mediaServer (computing)Cartesian coordinate systemCASE <Informatik>Level (video gaming)GeometryLecture/ConferenceComputer animation
Transcript: English(auto-generated)
Good morning to everyone. I'm Akhil from TLR, which is the German Aerospace Center. We are located across many different places in Germany, but I work for the Institute for Transportation Systems, which is also called TS. And we are located in Braunschweig, which is central Germany.
And I have a couple of my colleagues with me here, so feel free to reach later on, if not me, to them. So what am I here today for? So today's topic is talking about how we at the Institute for Transportation Systems use the different phosphor-G tools and softwares to create a tool,
a platform that can handle the data management for infrastructure data. So this team includes a lot of people, the core team, and the user base is quite big, but it's only, as of now, internal to the DLR Institute.
So, moving on. The first thing I need to share here is what were the problems that we faced prior to introducing and developing this solution. So, we all know about FAIR. Surprisingly, I didn't hear about FAIR much in this conference,
but that's a big deal. And we had this kind of problem that we were not able to find data related to the previous projects in our institute. And that led to, okay, we should push for this principle across the institute. And we were like the kind of leaders for it, my group.
So what FAIR is like findable, accessible, interoperable, and reusable data. So whenever a project creates its data, it should be stored in such a way that people, even outside the project, can easily find it through different means.
Then it should also be accessible. And interoperable is like the data generated by one project can be, or there should be a possibility to use it for a different use case. And reusable is something that is available to interoperable, where we can reuse and reproduce the data
so that the scientists that work on the data sets, they can reproduce their results as well, because it's something that is very essential in the research community when they publish a journal or publications, and the result has to be reproducible. So, also, we wanted some tool which can be a one-stop solution
for all the data storage needs of the institute. And finally, we wanted something, produced something where a new joinee or the old doc of the institute can be able to find whatever data and the metadata they want
quite easily without asking anybody concerned. So we wanted to remove this human factor from the retrieval of data. So what kind of data we deal with? In our institute, which is basically a transportation institute, we work with road and railways. So my team focuses mostly on the railway infrastructure
and the maintenance of the infrastructure, along with the different kinds of data that can be retrieved from the railway network. So that includes something like the topology of the network and the infrastructure elements. So these are the physical elements. Like you can see in this, you can see it in the bottom corner.
So these are all the infrastructure elements that one can find in the railway network, which can be switches, bridges, the rail segments, the mile posts, the signals. So there are many elements that one can understand only when you deal with such data.
On the image, you can see, this is the map of the half and bun. Which is the port of Brownshark. It's a tiny port, but we have units that run across this small rail network
and these units are mounted with different kinds of sensors, as seen here. So what types of data we deal with? So mostly it's the, if it is not the network, then it is generally the sensor data. So every unit that we work on has load off sensors on them that calculate different things.
As you can see in this point, we have GNSS sensors, the IMU accelerometer, and in some cases we also have some static or unmounted weather sensors. We also have visual sensors like cameras and lidars and laser scanners.
So in this particular image, you see this special vehicle, which has kind of like tires to run across on the road. And also it has some steel wheels, you can see here, through which it can actually travel on the rail. And this particular feature enables it to act as a railway wagon and it has a ton of sensors.
So my colleagues can drive it and also observe it internally through the dashboards and the monitors there, actually when the vehicle is moving on the lines. And this particular data that is captured across different campaigns can be stored in the database
or the data store that we have provided in our solution. On the right side you see the same image as earlier, but a different background. And this is the HDF5 format that we are pushing to use.
So we have created a template for this HDF5 format through which data that is acquired from different projects in different formats can be converted into one particular in-house format that will help us in streamlining and standardizing the datasets.
So what we try to achieve is that whatever requirements and the problems we are facing, we will try to overcome them. And as I said earlier, we created this platform, so it's called as PrideApp, the transportation infrastructure data platform. And what it does is like it can manage the datasets in a fair compliant way
and then distribute it across the department and the institute so that the researchers can use them. Then the datasets of the railway network and the conditioning of that network can be stored in a way that is kind of accessible and findable for the scientists.
And then there is sharing of the data in a standardized way and a standardized format so that it can be reusable and interoperable. The other goal that we wanted to achieve or we have kind of achieved is the use of the phosphorgy. Now you see all these tools and the audience here knows everything about them.
So I will not talk about them in detail, but like we use them. We have collected them in this platform so that it is quite dependent on the phosphorgy tools. And of course, we are a research institute that is running for a lot of years. So we had in place different kinds of infrastructure that was used prior to the PrideApp.
So we didn't want to like create the wheel altogether again. And that's why we kind of use some non-open source tools like the NetApp for storage purposes. And also Grafana is like force.
So jumping into the architecture of the tool. So you see here the various layers through which we have achieved or tried to achieve the goals. I will talk about each layer in detail and try to explain how they serve the purposes.
So starting from the data storage, we have data coming from the live streams like Kafka. And also it comes in bulk file formats which can be structured or unstructured. For unstructured data files, we generally dump them into the NetApp storage tool,
which is kind of, we call it as LDS, which I'll talk in detail later. And yeah, going to the next layer is the service layer. So here is the general tools we deal in our geo-domain, geo-informatics domain, which is GeoServer, GeoNetwork, and GeoHealthCheck.
We also want to have the map store from GeoSolutions IT in the service as well as the application layer, through which we did not create any web application for visualizations of the geo-data. So on the right side you see, so this particular repository,
this is something that is being developed by a different institute in DLR, and that institute is responsible for the research data management. So my colleagues are collaborating with them to create this interface between the geo-network in our institute and try to connect it with the bigger repository and catalog of data across DLR.
Next is the application layer. So here we have applications from different backgrounds. They are like from Apache Spark and Flink, and also some custom made Python applications, ranging from like four lines of code to like hundreds of lines of code.
They form together the big data cluster that we have assembled, and also there is a special downloader that I'm personally working on. So the idea of that application is it's Python based, and it will try to be a wrapper for the data that can be fetched from mostly the DWD,
which is the German Weather Service Distributor, and they have free FTPs where we can fetch data, but we want to fetch it in a more understandable and readable way. So let's see how it goes.
And the next layer is the monitoring layer, which mostly consists of Graphara and Prometheus. So the idea is that we monitor the different services we provide through this particular infrastructure platform, and try to monitor them as good as possible so that the services are continued in use,
and that no problem occurs. Next up is, I'll talk slightly how the infrastructure data combines, and is like fused with other types of data. So here you see like we try to categorize the data that we work on in four different ways.
First is the, I'll give the examples of what this data in the next slide, but to talk broadly, I'll say like starting from static, which is immovable data, so that can be topography, the real network, and so on, and then it is semi-static, quasi-static data,
which is like the vegetation, some things like buildings and all, which generally don't move, but there is a chance they might move, or they might be removed from the existing place. Then there is semi-dynamic, quasi-dynamic kind of data, where it is like generally tends to change over the time,
not so frequently, but it does. And then next is the dynamic data. So dynamic data is the trajectory data of the vehicles, pedestrians, and other movable things, which do change more often. So multilayer map is something we used to call earlier as LDM, which is again a confusing term.
So we all know what LDM is, but we tried that it should not mix with the core concepts, so we are calling it as multilayer map now. And it can act as a data hub and database for the georeference data, and there is a high possibility we can create web applications
by fusing this kind of data, which we had done in one of the applications, or one of the projects in our institute. Then the interface between these kinds of data can be standardized through, of course, OGC and some other standards that are very well known in our domain.
And there is a provision for pushing the data, sharing the data through API, and also, as I said, web apps. So the examples, as I mentioned earlier, are given here. So there is some mixing of German and English some places. So yeah, but that's what I meant what these type of data are.
Now, this is a quick glance. I'll not go into detail of the collaboration diagram that we have established, because in this platform, we tend to work with many groups and many teams. And it is not possible for a small core team of four to five people to manage everything.
So this is something that we came up quite late in our development phase, where we wanted to establish a standard procedure to incorporate the new projects along with the new geo data and how and where it can be stored and accessed and used and reused
and also published. So what kind of things do we need to do across the platform? So starting when you get the actual new data set, we have to decide if it is structured, unstructured, if it can be stored in database or not, then how it can be retrieved,
what can be the other aspects of data management, like the archival of data, the duplication or the backup. So the one important thing is that across this platform, we try to not have duplicates of data, because some data sets can extend to up to terabytes in size,
and that's not sensible to copy the same data set again in some other way. So for database and the data storage aspect of the Triad app, we have the TDP instance. TDP is the Transport Data Platform.
It's an old concept, but we are trying to renew it by extending it to different types of standards. Mainly it is the Rail ML, the version 2.5, where it's an extensive vocabulary of railway domain concepts, and we try to adhere to this kind of standard
so that it can be understandable and reusable across the railway community. Then, of course, we have the OSM importers to fetch the data about the railway networks. Then also inspire something that is quite widely used or is pushed for in the EU, and we also have our colleagues working for creating an importer for Inspire,
and of course, Python. So TDP is a database, and therefore it can work with the structured data, but for unstructured data, we have the LDS, which is based on the NetStore app I mentioned earlier. So you can manually or programmatically copy data to this kind of storage space.
For retrieving data, as TDP is linked with GeoServer layers, and the data from GeoServer layer can be stored to database, so we all know how that functions. So I'll not go into details of that. And then the database access, of course,
where you can just query and try to fetch the data. For LDS, as it is file format data, you can export them by creating different kinds of services or just go to that particular storage and manually retrieve them. Yeah, so here is the same diagram again.
I'll just focus on what you can see here. So the pink color, the curves is the railway network, and the small dots are the railway crossings that you can see in this particular map. And then there is a bridge, which I think is hard to recognize, but it's here.
So just a highlight that we can see different kinds of data. So this is part of the BAN server map viewer, which is currently used extensively in our institute, but we kind of want to change it to the MapStore, so another FOSS 4G tool.
So this is a quick look of what TDP looks like. It has a lot of things. So my colleague here has kind of spent lots of days creating it, rechecking with the real ML format and all those things. So, yeah.
Next is another example, a screenshot of GeoNetwork. We all know GeoNetwork can be a good catalog, but also we are trying to use it extensively for storage of the metadata because we also are a part of the HMC, which is the Helmholtz metadata collaboration. It's a big thing for metadata in Germany, and we are kind of active in that consortium.
So in a GeoNetwork, we can store a lot of metadata, including the contact person, all the provenance information of a dataset, and how there are too much metadata that we can explore.
This is something which we used to do more often earlier, is to use QGIS. So here you can see the same network of railways, but with all the attributes of the infrastructure elements in this particular map. So all these are the infrastructure details
that one needs for the maintenance and predictive maintenance of the infrastructure. This is just a screenshot of the Prometheus dashboard that we have developed for monitoring of the GeoServer. So here you can see it's a good tool to have, along with GeoServer, if you have hundreds of GeoServer layers,
and they are being extensively used by a lot of not just human users, but also by different tools and services. So we try to overview them through this particular dashboard and see whatever layers are frequently used,
or which layers are out of use, so that we can kind of remove it from the GeoServer instance and make up some space whenever possible. Yeah, so as I'm running out of time, I'll just quickly tell you. So this is the fairness matrix. You must be thinking I'm talking too much of fair,
but yeah, it is kind of a big deal. So we know the principles of fair, so we have certain aspects that we have already covered, like finding the data set through metadata, controlled access, then interoperability through the converters, importer-exporters, then we have the reusability through the community standards and open file formats,
and there are many things that we have planned, and many of the things we have actually started, just a few months from this year. Yeah, so rounding it up, what we have seen today is that phosphogy is huge, but also there are things that can be sufficient for your needs
if you try to go through them in detail. And then metadata is quite important, because metadata actually is the core of the data set, so yeah, try to have as much metadata as possible
to enrich your data. Then, yeah, my colleague has already given a talk two days before about the monitoring plugin for GeoServer, so you can check that out. And what we want to do later is try to, we are already having the raw infrastructure data in our platform
so that we can have one single stop for road and rail data, and we also want to have the cloud native stack along with the existing setup that we have so that it becomes more modern, as we have seen in the presentation yesterday. So this is part of DLR's DLR 2.0 project.
It's a big project and a cross-domain project from different institutes of DLR, and we are a part of it that way. So yeah, that's it from my side. Thanks a lot. Thank you very much, Ekio.
So we have time for a couple of questions. You're still processing the social bytes. Yeah, this is quite an application-oriented use case, so yeah, I will ask everyone to go for it,
try to have something in your institute as well. So you are interested in the raw infrastructure? This looks amazing, and if it contains all the data, like all the railways, that's perfect.
If there are some other projects going on, they might be interested in it. So my question here would be, and it offers, since we have GeoServer there running, can it be accessible as a stack catalog to get some of the data? I didn't answer clearly, but this is kind of like
still an internal thing. So because we work for a government-funded institute, we cannot give our data openly to outside players. So that's kind of an obstacle, but yeah. Oh, right, so the data is not open.
Yeah, the tools are, but not the data. Will it be open, maybe eventually? No. I personally think it's hard, no, because all the data is governed with SLAs and contractual agreements, but maybe if somebody has a noble heart,
like people in Deutsche Bahn or somewhere, they can, otherwise, yeah. Yeah, as he said, a lot of data sets are collected within certain projects and can be shared only with the project partners,
but there are some subsets which will be made in the future. I mean, they'll be made open source in the future. The GeoServers that we are using are openly accessible, and yeah, maybe in the future, some data sets will be accessible, but not all of them. But yeah, they can be reused for projects in the future.
When the project partner is involved, but yeah. When the project partner forget about their project, we can share it. Yeah, it would be very nice to have it as an extension or an alternative to Open Stream app, since it really looks well curated.
Yeah, we are also kind of working on, like, getting accurate or changing the, improving the accuracy of the data that we fetch from OSM, especially for the resonator. Yeah, yeah, thank you very much for this intense presentation. Thanks. We learned a lot, and this is the end
of the first presentation of this blog.