Datahub: the confluence of open data and geo data
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 351 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/68981 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Production Year | 2022 |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
00:00
Confluence (abstract rewriting)Open sourceInformationAgile SoftwareentwicklungInternet service providerPoint cloudText editorBuildingFunctional (mathematics)TwitterStandard deviationService (economics)Data structureUsabilitySelf-organizationLibrary catalogMetadataLibrary catalogOpen setProjective planeContext awarenessMetadataMachine visionCASE <Informatik>Level (video gaming)Self-organizationInternet service providerCycle (graph theory)SynchronizationGeometryDialectService (economics)Data structureVideo gameDirection (geometry)Dependent and independent variablesComputer architectureState of matterSet (mathematics)Presentation of a groupPhysical lawWebsiteProgramming paradigmOpen sourceWordServer (computing)Physical systemInformation overloadOperator (mathematics)Computer animation
08:05
Software developerLibrary catalogUsabilityPolygon meshSource codeInternationalization and localizationOpen setDatabaseSign (mathematics)LoginQuicksortDistribution (mathematics)Visualization (computer graphics)InformationTable (information)Revision controlEnterprise architectureDiffusionProduct (business)Group actionQueue (abstract data type)VotingDeterminantMusical ensembleCodeData managementIntegrated development environmentSuite (music)Condition numberLevel (video gaming)Table (information)SoftwareView (database)ImplementationPoint (geometry)Computing platformCore dumpSingle-precision floating-point formatWeb 2.0File formatCartesian coordinate systemMathematicsData storage deviceSet (mathematics)Projective planeSoftware developerMultiplication signMetric systemCodeLibrary catalogFlow separationBuildingVisualization (computer graphics)Software testingOpen setFocus (optics)Address spaceMereologyOcean currentImage resolutionControl flowRow (database)GeometryConnectivity (graph theory)MetadataSource codeCentralizer and normalizerCASE <Informatik>InformationWordVideo gameComputer animation
16:11
Service (economics)Analog-to-digital converterMusical ensembleFile formatLocal ringPartial derivativeTime zoneRepresentation (politics)Type theoryDimensional analysisTime evolutionDigital photographyAsynchronous Transfer ModeImage resolutionDigital filterConstraint (mathematics)InformationSheaf (mathematics)Codierung <Programmierung>View (database)Level (video gaming)Table (information)Revision controlModule (mathematics)CodeCluster samplingDeterminantLibrary catalogData managementCollaborationismComputer clusterRevision controlInfinityInformationSheaf (mathematics)Single-precision floating-point formatLink (knot theory)Different (Kate Ryan album)Computer animation
17:44
Pressure volume diagramNeuroinformatikService (economics)Library catalogInstance (computer science)MetadataSystems engineeringPoint (geometry)Self-organizationInstance (computer science)Computer animationProgram flowchart
18:08
Pressure volume diagramService (economics)NeuroinformatikInformationView (database)Self-organizationFilter <Stochastik>Open setLibrary catalogVotingMetadataInformationBlogSet (mathematics)Visualization (computer graphics)Text editorLevel (video gaming)Direction (geometry)Image resolutionComputer animation
Transcript: English(auto-generated)
00:01
OK. Hi, everyone, and welcome. Welcome to this presentation about the DataHub. Just a quick word from our sponsors before beginning. I just want to make clear that most of this project was funded by the Geo Orchestra community. You can see here the logos of the different organizations
00:21
that funded this project, Geo de France, Data Grantees, Geo Bortagne, and Craig. Thank you very much for your trust. So about Camp2Camp, we are a big open source company. We've been around for more than 20 years. We're a service provider. We are working in France, Switzerland, and Germany
00:41
for more than 150 employees, and it's a great place to work. So my name is Olivier Guillot. I'm an architect at Camp2Camp. Hi, I'm Florent Graven, tech lead at Camp2Camp. And it's better to use this one.
01:03
And we are both contributors to GeoNetwork. I'm a member of the PSC, and we will talk about the DataHub, which come from a new need upon GeoNetwork to serve new use cases. So first, I will start the presentation
01:21
with how we end up to create the DataHub.
01:45
OK, I will use the mouse. So yeah, why did we end up to think of the DataHub and to have this vision to create something new? So first, some history. Let's drop some context.
02:01
You may all recognize this. It really changed GeoNetwork life and many geospatial users' life. In 2007, there was the Inspire directive, which is a structure led by the European Commission to be sharing the geographical information.
02:23
So it pushed countries and states to establish laws to make mandatory to have inventory of data sets and to provide services to search, visualize, and download the data sets. They also tried to give some standard recommendation to promote the interoperability between the systems.
02:42
So after this directive, so it's random, OK, we saw lots of pineapples pop up in the European, in the Europe, so it's not an exhaustive list, but the European country had the obligation to implement the directive.
03:01
So we saw metadata catalog pop up in several countries. For the example of France, it was also mandatory to have catalogs, Inspire catalog at the regional level, national level and regional level. So we saw, again, metadata catalog popping out in all regions of France.
03:21
The typical architecture in the early ages was the region take the responsibility of implementing the directive and provide a metadata catalog. All the organisms within the region could register to this catalog and maintain their own metadata.
03:43
Then we started to see that more organization want to come into play and the same organization were able to just publish their own catalog and it end up with still the region catalog maintaining their own data set but also harvesting other catalogs.
04:01
For sure, GeoNetwork was the solution, and available for implementing Inspire directive. So because I think it's still the best solution who makes open source solution to catalog your ISO metadata. So the reality was mostly, not all,
04:23
but mostly all the catalogs were GeoNetwork everywhere, at the state level, region level, other levels harvesting each others. So this was quite the situation. Then a new paradigm come, the open data world.
04:40
So open data was a new way to new catalogs like really more accessible, based on simple principles, findable, accessible, interoperable and reusable. So it was a trend, we want the data to be open and we want to make it easy to publish this open data.
05:01
So there is no real restriction, no constraints, easy format, custom format, and it's very quick to create an open data. It was really trendy. We saw many, many open data catalogs popping out. It brings new tool, new use cases, new catalog,
05:21
and what was great is that many people could publish their data set and people were really happy to use a light website to search, to visualize with new use cases like data visualization, so it was quite trendy. Unfortunately, or not unfortunately,
05:42
but just GeoNetwork didn't really take the turn of the open data because it was just meant to provide geo data. So we didn't really take that direction to say that GeoNetwork can also be an open data catalog because it was not our concern at all.
06:03
So in the end, this is the situation today, still all the Inspire catalogs, but you can have also open data catalog at region level, open data catalog at city levels, and we have a constellation, a patchwork constellation of very heterogeneous catalogs.
06:22
The truth is that maybe today there is too many catalogs. It's catalog fatigue, metadata fatigue maybe. So everybody has an old catalog, open data, geo data, the data can be duplicated, where to find my data set, and the operability is not great.
06:43
So at the beginning what we tried, we tried to make all the system coexist, so we just tried to have an Inspire catalog, a GeoNetwork, and a CCAN, and we wanted to harvest one in both directions. So the Inspire catalog harvest the open data
07:00
which are geo, and the open data catalog harvest the geo data which are open. But it was really complex, complex architecture, the interoperability was bad because actually the open data world is great, but it's not meant to be interoperable. Metadata were duplicated, synchronization issues,
07:20
and it was a real pain to maintain. So for us it's not the ideal solution. Why? Just because all the schema are complex, and interoperability is very hard because it's hard to put cycle in boxes, and you always lost information. So the situation is not ideal,
07:42
that's why we start thinking about a new way to solve the problem. Okay, so then we realized that there are many problems, and then we tried to take a step back and think of another approach to handle this correctly.
08:05
So we tried and, where is it? Okay, so we tried and simplified the whole problem and go down to the very core of what we want to do, and what we wanted to do was
08:20
provide a single entry point to discover and visualize the data sets of your territory, and that's really all, there's no geo or anything, it's just that. And from that we decided to create the data project. Okay, so the data hub is built on three foundations.
08:43
The most important one is user experience, because we figured that we were doing all this for users to let users find data and navigate through all this data everywhere. So user experience would be the main point that we wanted to address. And then there was also developer experience, because it was something that we were struggling
09:02
for a long time in the geo network community. It was keeping contributors and bringing new contributors, because we were working with complex solutions and it was hard and the entry cost was quite high. So we figured, let's try to create something new that would be much simpler and easier for developers.
09:24
And we also wanted to have the less is more approach, so because we figured for years we've been trying to add features and solve problems one after the other, and solve problems from a technical point of view, and then we realized maybe this is just too much
09:41
and we will never finish all this, and we might just try and remove things on the other hand, and try and remove as much as possible and see how to keep it usable. So a quick word on the users, because I think it's worth mentioning that we have to remind ourselves as software developers
10:02
that software is made for users. Software catalogs at any kind of application is pointless without users, right? So sometimes we try and fix or address things in a technical way or find technical solutions, but really the most important metric
10:21
is whether users are happy and are able to use our software. So users are really everything. So from that start, we started shopping for ideas into other solutions and get some inspiration. And so here you can see, for example,
10:40
this is a screenshot from Open Data Soft. Open Data Soft has an interesting approach where it's very data-centric. When you're in the catalog, you already have a view of the data sets, formats. You see here, quickly, you have a map visualization of the data and also a table view. So we thought it would be really interesting.
11:01
It's a very simple database as well. In other solutions, for example, here at Udata, you have more of a social aspect. The data publisher will be presented, and you can follow, you can see the reuses. You have another solution here. This is Data Grand Lyon, and it's a very interesting platform
11:21
because you have a lot of all these features kind of coming together in very user-friendly ways. So we thought that would be really interesting as well. Then with all these ideas, we realized we needed something to actually start from. And then we realized that maybe the best people that could help us on this quest
11:41
would be the users themselves. So we started a UX campaign. So there were interviews of users and GeoNetwork users, and we tried and listened to everything they had to say, everything they wanted to see coming, and everything they hoped for.
12:00
There were several pain points while using GeoNetwork. So we tried to list them and address them. And with this, we could produce the first mock-ups for the project. So you see here, that's a mock-up. It doesn't really look like the other catalog solutions.
12:23
It's really, it gives a really big place of the search input, which is pretty much the only entry point for the users. And then you can see here, that would be a record view when you click on a record, you go on this page, you have the shortcuts on top.
12:42
And then right away, you have map visualization and a table view. And all this was built with the idea that we shouldn't be showing too much information to the user. Every time, at every moment, the user shouldn't be drowning in information.
13:01
So the reception for this mock-up was really good. Customers were really happy, they were like, wow, it's great, it's gonna change our lives. From then, we did a technical proposal. So for us, oh, that's broken. For us, it was really important to not add another system, another break in the whole system,
13:25
because we figured it would be not very smart to try and solve the problem of having too many moving parts by adding another one in there. So we really wanted to use GeoNetwork for this.
13:41
GeoNetwork would be our central source of truth, and it would harvest all these other platforms. And we could actually build harvesters for these platforms pretty easily. It's just JSON APIs, and it wasn't too hard, and it works really well. And then we had to choose a storage format
14:01
for all this metadata, because GeoNetwork is not able internally to use JSON documents. So we tried different things, we made experiments, and it was not always successful, and so eventually we chose this one. But really, it's not that important, and we realized that this was more
14:20
of an implementation details. This was the internal storage format of the metadata, and this is well supported by GeoNetwork, but it doesn't really matter for the users. It just has many fields, and we use them, that's all. Then for the UI, we relied on the GeoNetwork UI project.
14:42
So for us, it's an important project, because we're really betting on it for the future of GeoNetwork. It's a toolkit for building applications about metadata or other things. And so it's not meant to be a full replacement of the current GeoNetwork UI,
15:01
because replacing the GeoNetwork UI, it has so many features that we probably wouldn't be able to do it in a lifetime. So it's a different approach. We built this toolkit, and with this, you have components that you can assemble, and you assemble components and you put a style on it,
15:20
and then you have your application, which is more focused on one use case. It relies on modern technologies. That's also a big change. It works with Angular, it uses CSS variables, so it means you can easily theme it. There's a theme system, which is pretty efficient and very simple. It uses web workers, web components,
15:42
there's all these new things that we weren't able to do before. And there's a big focus on code quality and test coverage. So this is part of the developer experience I was saying earlier, is just to make the life of developers easier. So we feel this project is really a safe investment of our time, and we want to have a good investment
16:04
for our time, right? Let's do a quick example of things. So it's very simple to use, not many information.
16:21
You can just single entry point, infinite scroll. So it's just little details that make the difference about the user experience. Yeah, then you click on dataset, you have the dataset view, you can filter on the contact.
16:43
It's the version one of the datab, remove the filters, et cetera. Another thing, which seems quite great, so when you enter a thing and you pick up in the list, you will directly jump into the dataset instead of triggering a new search and then clicking.
17:02
So it's just small details actually which make the difference. In the view, you see that there is many things, many information. You can just jump to a section from another, you have the data visualization, so you can watch all the resources, the WMS, but also the WFS resources on the map.
17:21
You can click to have the information. You can see tabular data. The download links are obvious to find with the color, with the things. You can filter on them. So okay, it's just a UI, but it's pretty simple and straightforward.
17:46
So what's coming up? We are keeps implementing the mockups. We want to have an entry point from the organization of the catalog, so this is the dev instance.
18:01
You also want to have a newsfeed, so we want it to be like a portal with figures, with nice lists, nice information, with filters on organization. So the newsfeed, the publisher entry and filtering, so one publisher, what metadata does it have?
18:21
Social, so you can register to the metadata, your favorite, talk to other people, comments, better filtering. And what is very important that we want to do is we want to tailor the search according to user info. So when I connect, for instance, if I work on the city on Paris,
18:41
I will have as a priority all this data set on Paris. If I work at the region level, I will have the data sets that are at the region level, not at the city levels. So we really want to implement these kind of things. Then what's next? Maybe support for other backends,
19:00
so it's really binded to a geonetwork API, but it could be actually binded to another catalog. Who knows? Depend on the Nix or OGC API or other catalog. One very important thing is that data hub is two things. It's a way to harvest in the native API all the open data catalog in geonetwork,
19:22
but it's also the UI. And the UI we present to you could be used on any geonetwork. Okay, so once you have a geonetwork at home, you can use the data hub as a facade for the search. So this is the one great point, I think.
19:41
Then more data visualization. Open DataServe, they are an API to access the data sets to make data visualization. So we want to implement in geonetwork a data API to be able to search in the API, to paginate, to filter, and to be able to get the data in the tables, in charts, and so on.
20:01
One thing which is very important is that we covered the search usage for open data world, but it's still hard to edit, okay, because we are just harvesting. And there is a new challenge for us to be able to create open data. Okay, so this is something also, yeah,
20:22
that we wanted to mention here, because it's very important for us. So we thought that the approach that we chose for the data was pretty successful, and we figured maybe it could be used to solve other issues that people have with inputting metadata in their catalogs, and then we could create a new metadata editor
20:43
for geonetwork using this kind of same approach. So we want to do that following three points, which are make the metadata accessible for everyone, so not just experts, not just knowledgeable people, hide the metadata schema complexity,
21:00
and then make your own editor make it completely customizable. So there was a proposal on the geonetwork wiki, and then, so we want to, you know, everyone who are listening to us, if you're interested and you want to help with this,
21:20
there's a very simple solution, and it's participating to the crowdfunding campaign that we are doing. You can see here, there's a blog post on our website about this crowdfunding. It's called Help Fund the New Geonetwork Editor. So it's very important for us, because of course, if we want to work on this,
21:40
it's not a small project, so we'll need some funding, but it's also a way for us to show that users are actually interested, and this is an approach that users are interested in. And in a way, it's kind of a vote or so, and you're also saying, I think this is a good approach, and I want to push in this direction.
22:00
And the more people do that, the more we will be able to go further, and yeah, so if you feel like, if you agree with us and if you feel like that the right way, please take a look and please let us know. Yeah, okay, it's finished. Finished, okay.
Recommendations
Series of 8 media