We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

SDIs to open data platforms, the geOrchestra way

00:00

Formal Metadata

Title
SDIs to open data platforms, the geOrchestra way
Title of Series
Number of Parts
156
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
geOrchestra is a long-established open-source Spatial Data Infrastructure (SDI), grounded in the pillars of OsGeo: - GeoNetwork - GeoServer - MapStore - OpenLayers This solution has proven to be exceptionally robust, having been deployed at various levels including national, regional, institutional, academic, and research centers. As the landscape of metadata management transitions, embracing open data catalogs, data-centric usages, and modern applications, SDIs must evolve and adapt to this new paradigm. In our presentation, we will explore how the geOrchestra community, with support from the GeoNetwork community, has modernized its technology stack and offerings. This includes: - A comprehensive system for data ingestion and preparation. - A collaborative editor for open and geo-metadata. - A unified portal for accessing both open data and geo-data. - A versatile API that addresses a range of data use cases, including searching, paging, processing, analyzing, and aggregating datasets. - Enhanced capabilities for data visualization. These advancements collectively contribute to the development of a sophisticated open-source data platform, incorporating a streamlined data ingestion system and more.
Keywords
Source codeComputing platformSpatial data infrastructureLibrary catalogOpen setVideo game consoleStatisticsOpen sourceProjective planeSoftwareConnectivity (graph theory)Computing platformServer (computing)Instance (computer science)Level (video gaming)File viewerUniverse (mathematics)Service (economics)Point (geometry)Cartesian coordinate systemGroup actionFreewareDisk read-and-write headInternet service providerDifferent (Kate Ryan album)Library catalogBitExecution unitGeometryOpen setSynchronizationVisualization (computer graphics)Direction (geometry)Data storage deviceOnline helpProduct (business)Gateway (telecommunications)Web 2.0MetadataPhysical systemState of matterProxy serverStack (abstract data type)Figurate numberAuthenticationBusiness modelAuthorizationINTEGRALTraffic reportingSystem administratorLecture/ConferenceMeeting/InterviewComputer animation
SoftwareFile formatCharacteristic polynomialSoftware developerModul <Datentyp>Personal digital assistantDependent and independent variablesSpatial data infrastructureStandard deviationSpacetimePhysical systemInterface (computing)System programmingProcess (computing)Projective planeProcess (computing)Inheritance (object-oriented programming)Moment (mathematics)Computer animation
MathematicsLibrary catalogDifferent (Kate Ryan album)Attribute grammarStatisticsInstance (computer science)Arithmetic meanShift operatorState of matterVisualization (computer graphics)Open setSet (mathematics)System administratorConnected spaceProcess (computing)Information technology consultingMetadataGeometrySource codeOffice suiteDatabaseComputing platformServer (computing)Level (video gaming)Service (economics)SoftwareComputer animation
Server (computing)Computing platformLibrary catalogTrigonometric functionsGeometryWebsiteSynchronizationPoint (geometry)Computer animation
Library catalogComputer wormSource codeAudiovisualisierungHard disk driveMaizeRevision controlCollaborationismLibrary catalogCartesian coordinate systemPoint (geometry)Level (video gaming)MetadataFocus (optics)Open setGeometrySet (mathematics)User interfaceConnectivity (graph theory)Table (information)SoftwareTrigonometric functionsInstance (computer science)SubsetForm (programming)Web 2.0BitQuery languageFile formatField (computer science)Computing platformDescriptive statisticsWebsiteServer (computing)Product (business)Computer animationLecture/Conference
Interactive televisionVisualization (computer graphics)Personal digital assistantAudiovisualisierungProcess (computing)Point (geometry)Different (Kate Ryan album)Computer fileLibrary catalogOpen setVisualization (computer graphics)MetadataInstance (computer science)Process (computing)Computing platformPoint (geometry)GeometryService (economics)StatisticsClient (computing)NeuroinformatikMoment (mathematics)Server (computing)Computer animationLecture/Conference
Library catalogText editorComplete metric spaceData Encryption StandardFinitary relationComputer-generated imageryUniform resource locatorAuthorizationComputing platformOpen setPoint (geometry)MereologyLibrary catalogInterface (computing)Cartesian coordinate systemDiagramForm (programming)Text editorMetadataComputer animation
Software maintenanceLoop (music)Analytic setVisualization (computer graphics)GeometryComputer animation
Library catalogComputing platformElectronic visual displaySource codeInformation securityPerspective (visual)Open setSuite (music)OvalFreewareModul <Datentyp>Interactive televisionVisualization (computer graphics)Personal digital assistantServer (computing)BitLibrary catalogRow (database)SoftwareFigurate numberDirectory serviceStatisticsOpen setSelf-organizationInstance (computer science)Point (geometry)View (database)Projective planeType theoryPosition operatorGroup actionSet (mathematics)Connectivity (graph theory)Level (video gaming)Transportation theory (mathematics)Landing pageDatabaseDiallyl disulfideData managementNatural languageMetadataLink (knot theory)Cartesian coordinate systemDifferent (Kate Ryan album)Presentation of a groupMultiplication signService (economics)Physical systemProduct (business)Process (computing)Goodness of fitPerspective (visual)Message passingInternet service providerInformation securityOnline helpPay televisionInstallation artComputing platformCASE <Informatik>Analytic continuationMathematicsComplex (psychology)Canadian Mathematical SocietyComputer fileDiagramExtension (kinesiology)AuthorizationEmailAttribute grammarRight angleFreewareStability theoryComputer animationLecture/Conference
Least squaresComputer-assisted translationComputer animation
Transcript: English(auto-generated)
So, today my talk will be about this challenge to move from a special data infrastructure to open data platform. And I will introduce how we handle to do that within the GeoCastra project, which is an open source, free and open source software to provide all SDI components for
a region or for whatever who need that. So, to introduce myself, I'm Florent Graven, I'm the head of technology at Camp2Camp. I've been working in this ecosystem for maybe 15 years now.
And at Camp2Camp, Camp2Camp is a service provider and we are really implied in the special open source ecosystem. So, we are contributors of many OSGEO projects like GeoNetwork, GeoServer, OpenLayers.
We try to be in the steering committee to be able to drive the project and our role as a service company is more to integrate open source solution for customer needs. We don't have any product, we don't sell product license, so our business model
is more to promote innovation and try to pull the customer within the direction where we want to innovate. So, I will briefly introduce what is the landscape today between SDI and open data platform.
Then I will introduce what is GeoOrchestra and how we did the move to embrace the open data world coming from geospatial data. Then I will briefly speak about some figures about Metropel-European-de-Lille, which is a big city in France which has done this move and we'll see how it goes
for them. It's kind of a success story and then I will conclude and open a discussion. So, this is the actual landscape. So, from a while, from inspire and from different needs, people have started to build
geospatial data infrastructure, so it's geospatial data platforms, where it's only geospatial data. So, you can recognize some bricks, so basically you need to have a geospatial server like GeoServer or MapServer or QGI server to publish your data set on the web and to deliver different services or GC services to be inspired
and compliant. You also need to have a metadata catalog, so for instance, the OS GeoNetwork or PCSW or other stuff, which provide the metadata in the discovery entry and the CSW entry point. And I show open layers, which is an OS Geo component as well,
to display the map, et cetera. So, this is a basic geospatial infrastructure stack and a lot of administrative units have them, especially in Europe because it was mandatory from the Inspire directive. And we see that at state level, at region level, department level, city levels,
but also in the university level research and even proprietary like companies. So, this is all what we know. And since like a decade, I would say that there was a new movement, which is the open data, where everybody was able to push their data,
where we want all the people and the city and the company to open their data, to share their data, so everything is transparent. And this movement was completely split from the geospatial movement. So, this is a bit, the goal of my talk is how we can unify these two movements
and how we can make that all this region that have a special infrastructure can embrace and host open data without having two platforms or two catalogs. So, what is Geoorchestra?
Geoorchestra is a free and open source software since 2009. And actually, it was one of the, it came out from Inspire, where a big region in France, Britannia, needed geospatial infrastructure. So, basically, they decided to took FOSS softwares, so OS GeoBricks,
and just bring them and aggregate them together. So, Geoorchestra is nothing else but one GeoServer, one GeoNetwork, one map store to view the, to visualize the data, plus some PET application like MViewver, which is data visualization viewer,
GeoContrib, which is a way that people can give what is happening on the street. If there is a hole on the street, they can report that, for instance. So, it's a bunch of, it's a constellation of open source softwares,
mostly OS Geo softwares, and Geoorchestra bring the glue around that, like a proxy or a gateway upon that. So, all these bricks are connected together through the same authorization, authentication system, and the same way to handle the roles,
the groups, and the credentials. So, actually, if you just need, if you want to have a small start with an SDI, publish your geospatial data, and you want a GeoServer, a GeoNetwork, and a map store, it's preferable to go with Geoorchestra because you have all this synchronization between all the stacks,
which is done already. And you can remove one thing if you don't need it. For instance, I don't need GeoNetwork. You can go for GeoServer and a map server. So, this is what is GeoOrchestra. And it's distributed in different manners, so it's easy to install it.
You have a Docker composition, Kubernetes and help chart, Ansible, Maven packages, and you have a lot of service providers that can help you on the way. It's really focused on the community, which what is great in this community, compared to other communities that I belong to, is that all the users, the customers, they know well each other,
and they really participate together to make improvements within the solution. So, they are co-creators, co-partners, and we work all together to bring fresh air into the community. And it's really pleasant to work in this manner.
So, it's a very healthy community. And what proved that is that they have written a manifesto, which is really clear about what is it, how it has been designed, how you can bring contribution to that, et cetera, et cetera. So, it's really transparent, it's really healthy,
and it's a great project and community to dive in. At the moment, there is an incubation process to become OSGO label. So, maybe, Tom, you are going to be our parent. I don't know how to say that. So, hopefully, next year, it's going to be an OSG project.
Let's see. But it's the philosophy. What is the audience? So, I told that there is state, region, countries, departments, cities, university, research centers, and so on. So, it's very spread, mostly in France,
but also worldwide and in Europe. And we really want that it can reach, yes, other community and other countries. So, the challenge here is to embrace the open data world. So, as I describe it, it's a pure geospatial infrastructure, and it's only that.
But, since some years, people who were administrating the georchestra were also meant to administrate the open data platform. But usually, it could be different districts or different offices, like geospatial is from geographic, open data could be from statistics.
So, it's a huge challenge. So, how we can do this shift, and how we can move toward this goal? So, this is the workflow, how we want georchestra to be now, and we want to simplify this workflow. Actually, if you think about it, it's already existing,
but it's not easy to do that. So, for instance, connect to data sources. I mean, through GeoServer, you can connect different data sources or PGO API. But, is there any administration consult for that? Do you need to learn to use GeoServer, to use GeoNetwork, to use MapServer, et cetera?
So, the way here is really to ease the administration process of the data workflow. So, we connect different data sources, then we prepare the data, like we can change the attribute names, we can apply a filter. So, we really extract from the rough database
the data how we want to share it to the public. Then, we publish it through different services, metadata, but visualization. And then, we want to promote the discovery with an API, to providing an API, and sharing to promote that people can reuse our data,
because this is the meaning of the data. It's not just to say, okay, I have a patrimony of data sets, I am an Inspire compliant. But now, people, they want that the data are used in an efficient way. So, what do we need to do to move to that?
So, first, when we have a GeoOrchestra platform, so, GeoServer, GeoNetwork, et cetera, we need to be able to host just open data, and not just geospatial data. So, this was the first goal. Toward a unique catalog, as I described, there can be open data soft,
a CCAN, or other solution in one website, and a geospatial catalog on another website, both on the same city, or on the same region. It's hard to maintain, it's hard to synchronize, because some geospatial data are open, and some open data are geo.
So, you need to cross-synchronize things, and it's really hard to maintain. So, the goal here is just to have one entry point. So, here is the first thing that we did. It's not a big step, but it allowed our users, through GeoNetwork, to be able to provide open data within their catalog.
How? Just by harvesting open data catalog. So, it's just the first step. I have a constellation of catalog within my institution. Maybe I have one geo's catalog, one geo network, another geo network, an open data soft, a CCAN. And then we can harvest it all through GeoNetwork,
so we have just one API and one entry point. And then we designed the DataHub, which is a new user interface based on GeoNetwork API, but which targets open data needs, which were inspired by open data catalog. So, for instance, CCAN and open data soft,
they provided a great new experience about the data, and we got inspired by that to build the DataHub. So, this was the first step. And it has been done by Geo de France, and it's in production for two or three years now. And it is based on GeoOrchestra.
Behind that, it's a geo network and a geo server. And you have your open data platform, geo open data platform, where you can find both open data and geo data. But it's not all. We also need our user experience to be data-centric.
If we think of classic SDI, there is the metadata and there is the data. And when we speak about a catalog, it's a metadata catalog. But actually, in the open data world, we focus on data sets. And it's quite important. We got inspired by who you know,
and we try to provide the same user experience. So, it means that now, on the catalog, when you click on the data set, you just not only have a huge description, all these metadata fields, which could be quite boring or repulsing,
but we try to focus directly on the data and be able to see the data. So, this is the data-centric approach, where it's slash data set and the UID and not slash metadata, for instance. And we can see if there is WMS, WFS, vector, GeoGESON, we can see them on the map.
If there is GeoGESON features, CSV, Excel, or whatever, we can see them on the table. And we can see them on a DataVis component, and we can start to use the data and to take value out of it. All these components are within the data hub,
and here you see that you can export them and embed it as a web component or iframe to a third-party website, so you can easily refer to a data set. We should be able to directly download the data, which is not really the focus, for instance,
of GeoNetwork, but it is an open data catalog. I need this data in this format, so we change a bit the experience toward this. And we need to reuse the data through an API. We reuse the data, it means that, for instance, a third-party website will build an application open my data, but maybe just open a subset
with a filter, et cetera. So, we focus on the different APIs, and we provide a form to generate an API query that could be used elsewhere. So, we host open data, we promote data,
we are data centric, and now, how to promote the data. So, I talk about an API. In the GeoServer world, there is WFS and OGC API that could help us, but it's not enough to really promote our data, because what we want,
for instance, is data visualization charts, and we can't do that with OGC features if the file is very big, because you have to fetch all the data and then compute this chart on the client side. If you want that this chart could be generated directly from the server, you need to have different API
or different services, it could be processes, but we want it to stick within the OGC features API. So, we want it to extend that. So, open data catalogs seek an open data, so they come with their own API. So, we need to come with an API
that is OGC feature compliant, but brings new abilities for data visualization, for geospatial aggregation, statistics, and computing, but also to search within the data, which is very important.
At the moment, on the platform, you cannot search within the data, you just search in the metadata catalog. So, all these points help us on the way to be more an open data platform. Then, open data authoring. I introduced this diagram where we are not really doing open data
because we are harvesting open data catalog, where the goal here is to remove all the thing on the left and be standalone and sufficient. So, we introduce a new application as the data hub is the fancy search interface.
We introduce the new editor within GeoNetwork and within the same technology, which will be a very simple editor to be able to fill open data. And without any problem with the schema, with ISO, with Inspire, you just define your form
and you create your data, you push your data, and it's very easy. And that way, we will be able to remove all our dependency and host directly the open data within the catalog. So, a huge UX campaign has been made
and we are developing this new application, which is part of GeoNetwork and will be part of the orchestra. So, a completely new editor to handle our open data catalog with a very simpler user experience while authoring metadata. And then, the loop is almost filled.
We are autonomous in handling the creation, the update of geospatial dataset, but also open dataset, which are not geo. We provide all the data through an API, which allow us to do data visualization,
aggregation, analytics, and everything. We have two new tools, one for the maintainers and one for the users to be able to get value out of the data, and we are almost done. All this work have been driven by some Geoorchestra customers,
and I will briefly introduce the Metropole Europa Ndelil, which is almost the first Geoorchestra user who display themself as an open data platform. For instance, they have removed GeoServer from Geoorchestra. They don't need it anymore. And it's data.lilmetropole.fr, which is just like that.
And the landing page is a data hub, a custom data hub, and there is no more CMS or whatever. It's just that, so it's just a Geoorchestra, and all the links link you to the other bricks of Geoorchestra. So, this is the open data catalog
of Lil Metropole based on Geoorchestra, and we provide a great experience about that. So, it's in production, it have been in production for two weeks or three weeks maybe. It allow them to remove two subscription, to one to Open Data Soft and one to Isogio. They have removed everything.
The service transition has fared well, so it means that they kept the old APIs from Open Data Soft, and we opened our new APIs, and there is, yes, it works, the continuation of the service.
And it brings best practices in data publishing because actually they figured out that with Open Data Soft, many people, they were pushing data without any quality, and here all the process has been cleaned, and for them, it's a very great point. Some figures, so, Metropole Opendoly is 11 cities
which publish data and data set. They have a very big data set, like rainfall every quarter in different places, waste collection with more than a million entries, and the service that they provide is really performant and stable.
For instance, they have all the transportation data refreshed every minute, and everything passed through Georchestra. So it means that there is a provider for the transportation, the buses, et cetera. When there is issue, they send every minute the position, the problems, et cetera,
pass through the Georchestra API, and we can watch the data through the data hub, and we can use those data through the API from another application. So for instance, this API I consume by worldwide services like Google for transportation data.
So it works, and for them, it's a very big move. The perspective, it's not all done, and we haven't really replaced the Open Data Catalog in all regards. So for instance, they want to have a security layer. It means that if they want to protect some data, et cetera,
so it's different from open data, but actually in open data, you want to protect some data. Then what is missing, it's the data ingestion system, where we want to provide an experience like in open data software, it's really easy to connect to some databases, to upload some files, to prepare your data set,
change the attribute names, change the extent, et cetera. So the user experience is great for the manner of the platform to really decide what they want to publish instead of creating a SQL view in PostGIS or a thing like that. And generative AI, we are investigating that
at Camp2Camp, yes, to talk with Geoorchestra, with natural language. So for instance, I have something running, but just you type your phrase and you can find the data of your platform and you can take action of your platform from this.
So to conclude, Geoorchestra, the premise is that it's open and free. It's modular, it's accessible. And we try to, all that we design within this project, we try to be transparent, to be the right way to communicate, to do it in an open-tooth way.
So it's usually harder and it costs more, but we think that it's beneficial for everybody. So it's what we like to do and we want to do at Camp2Camp and within this community. Some links, if you get the slides. And that's it, thank you.
If anyone wants to, yep. Hi, Florian, thanks for that presentation. So related to the aggregation statistics in the aggregation statistics,
the faceted statistics. So we're bringing that now to OGC API records. To me, it makes a lot of sense to also add that to OGC API features. So you would be able to do that thing. Are you implementing in a similar way or how do you implement that?
Yes, so we are introducing the facets within OGC API records because we think that it's very useful for searching metadata. So we are working with Tom and other people. The question is, do we need that for our features?
Yes, of course. And actually it was one of the main goal of pushing that in records because we know that records will land in features. And it's very important to have the full text search which is in record and to have the facets in OGC API features. So the experience while browsing to the data
is really better than it is with WFS. Any more questions? Yeah, I'm curious to understand if that would solve all your needs
of creating these nice diagrams or you need more. If what I presented answer to the issue, you mean? Yes, so these nice overview diagrams with statistics, so many features of this type, so many features of that type, you could manage that via facet statistics.
This? Yes. Which one, this one? This one, yeah. So this one is more like you have data, for instance, about the rain. And you want to, you have every quarter, you have one figure per city,
not per city, but per captor with the amount of rain. And what do you want to do with that? So it's just that we want to provide something that help you to distribute the value over a city, over a month, for instance. So it's just aggregation, like...
Like a facet? Yes, but complex fact, maybe more complex. So yeah, that's where I'm heading, yeah? So what complexity do we need to add to facets to support this use case? I don't know, because we didn't... No, no, no, it's just take some time to think about it.
There is different level of aggregation in there, so... Yeah, yeah, yeah, yeah. Yeah, but it's a need from a project. Yes, right. Okay, there's a question over there. Thank you. My organization currently has GeoServer and GeoNetwork. We are planning to start an open data catalog
with also next to spatial data, BI data and documental data. What would be the approach to, like... Now we have, if I understand correctly, GeoOrchestra has components
of GeoNetwork and GeoServer in it. Yes. And how would we start with replacing GeoNetwork and GeoServer with GeoOrchestra? Okay, good question. I think it's not that hard. It means that you will need to install GeoOrchestra and then synchronize your databases,
your data directory, et cetera. So for instance, GeoServer is almost vanilla inside of GeoOrchestra. There is just the header for the authorization and the credential. So you could use the GeoServer, install GeoOrchestra, use the GeoServer, copy paste the data there, the data directory,
and it's going to work honestly. GeoNetwork is a bit harder because you will need to copy paste the databases, et cetera. And maybe you will need to check that if you have role and groups,
this will be erased and replaced by the top level group management in GeoOrchestra. So this is the only thing you need to carry about. We can talk about that afterwards if you want to. Okay, I think we need to move for the following presentation. You can grab Florian in the rest of the conference
and talk to him. So let's thank the speaker again. Thank you, thank you.