Transition from one INSPIRE metadata standard to another and move from Geonetwork 3.x to 4.x – lessons learnt
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 156 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/68558 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
00:00
Standard deviationLibrary catalogMetadataSet (mathematics)Web serviceArchaeological field surveyData modelDifferent (Kate Ryan album)ImplementationProjective planeData managementTask (computing)Internet service providerSet (mathematics)Web serviceTemplate (C++)Meta elementWordMetadataLibrary catalogFlow separationFormal language1 (number)Revision controlSelf-organizationLecture/ConferenceComputer animation
01:37
Open setMetadataSelf-organizationBitNumberCASE <Informatik>MetadataWeb serviceComputer animation
02:16
Physical systemInterface (computing)Web serviceLibrary catalogBEEPStandard deviationMetadataCASE <Informatik>Revision controlLink (knot theory)Element (mathematics)InformationLocal ringInternet service providerProcess (computing)Internationalization and localizationMetrologieElement (mathematics)1 (number)Standard deviationMereologyMetadataInformationLink (knot theory)Internet service providerProduct (business)Web serviceDescriptive statisticsOpen setWebsiteComputer fileScripting languageProcess (computing)SoftwareValidity (statistics)DialectTraffic reportingBitAdditionSelf-organizationPublic domainMaxima and minimaBackupFocus (optics)Direction (geometry)Interpreter (computing)Multiplication signPhysical systemRow (database)Revision controlLibrary catalogGroup actionIntegrated development environmentConstraint (mathematics)Computer animation
09:12
Process (computing)Internet service providerLocal ringAsymmetryInternationalization and localizationMetadataWeb serviceIdentity managementType theoryMountain passJava appletTemplate (C++)DatabaseData typeTerm (mathematics)Time domainElasticity (physics)MetadataWeb serviceValidity (statistics)Data managementNeuroinformatikState of matterElement (mathematics)Process (computing)CuboidRow (database)Internet service providerResultantTunisGoodness of fitArtificial neural networkSummierbarkeitConfiguration spaceDatabaseWeb portalVirtualizationSet (mathematics)Group actionExpressionRevision controlServer (computing)LaptopTwitterWebsiteWordData miningQuery languageFigurate numberComputer programming2 (number)SoftwareMultiplication signOpen setWeb pageDiallyl disulfideError messageComputer animation
16:08
Elasticity (physics)Time domainWeb portalWeb serviceLibrary catalogMetadataCategory of beingData managementTemplate (C++)Graphical user interfaceFormal languageArchaeological field surveyCodeMoment (mathematics)Latent heatConfiguration spacePlanningControl flowOnline helpGroup actionValidity (statistics)Position operatorGrass (card game)Projective planeRow (database)Process (computing)Context awarenessLattice (order)Spring (hydrology)SoftwareRevision controlMetadataUser interfacePresentation of a groupElectronic mailing listInternet service providerTemplate (C++)Domain nameOpen setWeb serviceMathematicsProduct (business)Multiplication signInformation securityPhysical systemGeometryComputer animationLecture/Conference
23:03
Lecture/ConferenceComputer animation
Transcript: English(auto-generated)
00:00
Well, I'm a geographer that has worked with a lot of different tasks, for example, Inspire Me implementation and data modeling and as a data project manager and so on. The last three, four years I've been working at the National Land Survey of Finland,
00:22
helping data providers and users to get fluently access to spatial data sets and services that has been published in Finland. And my talk today is focused on metadata and the ecosystem around it,
00:44
how we work to improve the metadata and how about this version update that is taking place. So, first a few words about our metadata catalogue.
01:02
It's a service provided by us for all data providers in Finland to describe their spatial data sets and services and it's the official Inspire discovery service in Finland. We have several metadata templates for our users to use in three different languages,
01:26
also multilingual ones. And this catalogue has been based on GeoNetwork for over 10 years. And here you can see that we have over 150 organizations that are using it
01:46
and the size of the square expresses how many metadata they have published with the service. Of these, around 650 are Inspire metadata and a little bit more of the metadata as a number describes open data.
02:10
Some cases they are the same but some cases they are different. So, it's widely used. And the national system works like this.
02:21
We ourselves at the national landsway harvest from existing other metadata catalogue at the Finnish Environment Institute, Finnish Metrolikil Institute and Helcom have. They have some metadata that they want us to have.
02:41
For example, all the Inspire metadata ones and then they have their own systems. And from us, the data goes further on or the Inspire metadata part is going further on to the Inspire geo-portal and we are using the whole metadata asset in our Oscari-based portals such as the national geo-portal.
03:08
So, there you can search on everything we have on this 1800 metadata. We also use it in a house, for example.
03:21
So, there is a link between, we are using the CSW for having the product descriptions embedded on our website. So, it also is based on the CSW. Well, anyone is free to use the open CSW and a big user is the open data portal.
03:53
Let's take a close look at our cooperation with the open data portal. We actually pre-filter our CSW with a keyword so that we have virtual services providing only metadata with that keyword.
04:15
And then they filter further based on license if it is CC by four or public domain.
04:23
And only those with the correct license are going further. And this is the work that we have done, the previous work years and focused on getting this process and quality better. And then it is going further on to the data, the European version of the open geo-portal.
04:44
So, then to my topic of today. I am going to talk about two goals that we have set for our work in the last years. The first goal was the transition to the newer version of the Inspire metadata standard.
05:06
And we wanted to reach technical compliancy. It was also required by Inspire monitoring as it has been. The Inspire monitoring has been based on metadata since 2019.
05:22
But we also really wanted to focus to improve the quality of the metadata. We didn't only want to focus on the technical compliancy but also to the quality and the reusability. And for this, you see there that we used the XSLT scripts in geonetwork
05:44
for everything that we could do in national landscape in advance for our data providers. And then to reach technical compliancy, we also need to do some manual editing. And add some new required elements and create keywords.
06:01
Some elements had to be expressed in a new way. For example, the license information, use and access constraints. There was a different way to express it in a new standard. So, it had to be redone. And all the big data providers, they could do this by sending them the validation reports and asking them to do the checks.
06:28
And with a bit of iteration, it worked very well. In addition, we really wanted them to check the links so that the links wouldn't go to the website of the
06:41
organization but to a possible WMS and WFS to the services where you can either download the data or use the data. And we put a lot of focus on getting this license information correct. As we knew that this open data directive was coming up.
07:01
We also asked them to look at the keywords and add these open data keywords when needed. And to mark the metadata not inspired if the new interpretation was that they are not according within the inspire directive.
07:22
So, these kind of quality things. I could talk about that quality for 10 minutes but I don't really have time this time. But then, to be honest, it's easier to me. It's quite complicated to move from one standard to another.
07:43
Especially for the small data providers. And then we thought, well, how are we going to meet this goal really? We are not going to meet it if we just make a news item on our website that you should go and update your metadata. So, we thought, how could we help them?
08:02
And then we developed in late 2021 a process for that. That was used on, we used also the CSW for that.
08:21
And like the GET records, we made a process where for the small municipalities and regions, we took one region or municipality at a time, took a backup copy and did all the semi-autocatmatic processing in GeoNetwork with the XSLT scripts.
08:41
We did that for them. And then we compiled an Excel and it was done as XSLT XML to CSW file. And then we only took the elements that we really wanted them to focus on. So, it was a maximum of seven columns in the Excel.
09:05
Check these or add these missing elements. And we tried to make it as simple as possible for them. And then they returned the Excel to us and we did the editing for them and the validation.
09:21
And we were in contact with them if there were problems with something else than these five to seven elements. But this worked really nicely. And the team we had here, I more or less developed the process. And then I had a coder, a good colleague of mine, Yary Rainey,
09:41
that many of you probably know, who then did the, when I had made the XSLT queries, GET records queries, he made a small program for us. So, because we thought with a team like this, I can't just get them, ask them to do working XML Spy.
10:02
So, he made a small program for us. So, the first and the third box could be done by just adding the name of the data provider, the name of the municipality. So, and then we had three others.
10:20
We had the trainee with us for four months and two of my colleagues involved. So, and we hadn't really a way to check that how are we doing, but we then got the result of the monitoring in 2020 of the results from December 2022.
10:42
So, you see a good improvement, but of course it's not 100%, so we are not happy. The trend is there and we had this for a long time, had been aware about this bulk validation tool that we really wanted to use, but it was impossible to use it on our national land server laptop.
11:05
But then somehow we got our managers convinced that we can get a richer computer to have a separate computer for installing the bulk validation tool.
11:23
And yeah, my colleagues in the NOSIN and Jari again did some fine tuning of that and we set up another process that we used last year. We used the same kind of elements as the previous process.
11:41
Here we gave the data providers, also the national governmental state the possibility for us to check all their metadata. And then the lower box expresses that before the Inspire monitoring date, we also checked all Inspire metadata ourselves
12:03
and then could find out in which Inspire metadata there still were some problems that it was not according to standard. We gave the opportunity to the data provider or correct it by ourselves
12:21
if the corrections were purely technical. And then for this it was myself and Jari installed the computer for me and we were then able to identify, you see there in column G, the errors and in the first column A you have the data provider name
12:46
so we could identify where the problems are. And now thanks to this tool we are up to these good figures for last year. So happy with this.
13:02
So let's go to the second goal. I'll sum up all the lessons learned in the end. So the second goal was to do a transition from neural network 3 to 4 by June this year and we chose to go with version 4.2 and we started last year.
13:24
We had a lot in place already in May but then came things up with the IT theme so they had to focus on other things and we restarted last December with a new version of the Postgres database and a new installation.
13:44
So then we lost all configurations and then we continued to work with the issues that we had not yet investigated last year. Like how do we really replace the virtual services that we have in neural network 3
14:02
because virtual services are not able to build them and our cooperation with both open data and on the website was based on that and can we get all the 50 plus harvesters that we have to work when we harvest from other portals and do we face issues with the new CSW
14:25
or does it work as we hope to. So first a few words about replacing virtual services with sub portals.
14:42
That was really nice, no problems whatsoever. It's supposed to pre-filter the metadata for example by keyword and you can decide to publish like a set of your metadata portal
15:02
and here in the picture you have only the Inspire metadata. So it could be a useful tool also for our users to check do they really have all the Inspire metadata there. So we of course tested it with our key partners
15:25
and it was easy to implement. It took me some while to figure out this losing expression. There's not so many examples out there but the idea was to filter one sub portal that only showed the data
15:46
the dataset metadata of national land server Finland to filter out all the service metadata and we succeeded in that and we succeeded in getting it to our webpage.
16:04
What then we faced that was not so nice and we were not aware about that in advance was that the CSW get domain request is not supported in the geo network version 4 in Elasticsearch
16:20
and we really have made use of that in our metadata search in our OSCARI product for example in this national geo portal. So it means that we will have well OSCARI coding is on our hands so we will deal with it somehow
16:41
and the question mark is there is how to solve the lack of the support well I have colleagues now in the IT team thinking about it at the moment. So here we are, we didn't reach the goal but we are a long way. We have that get the main issue to solve
17:01
not to lose the metadata search in our geo portal and then we have some small national land survey specific configurations to put in place. Well, what's the next step then to make the update?
17:23
Of course we hope to do it after summer break but we need to have a plan or a replacement for this lack of get domain request and of course the grass is always greener on the other side
17:40
but we realised on the way that there were things that we thought would have been in version 4.2 that actually are just in version 4.4 and things like that with the OGCAP records as a microservice to my understanding is only in version 4.4
18:02
and also there is work with this geo DCAT AP and the security has been improved. So quickly about the lessons learnt.
18:22
I think this is not the last version of my presentation. I had collected the lessons learnt from both goals but I guess you have to have patience if you have a lot of data providers and a lot of dependencies in your systems, changes take time
18:44
but it's possible to make the changes and our team working with metadata and validating we all now know a lot so we don't have the person dependencies that we had three years ago.
19:01
So we are in a very good position and also we have better templates and user guidelines so we can support users in a much better way. This is related to Goal 1 and then when it comes to Goal 2 well it is slow to take a new version into use
19:25
mainly related to the dependencies but we have also a lot of configurations that we have done to our templates and to the user interface and it was also challenging the same IT team
19:40
is working with a lot of services so even though we had a really good plan for this autumn there was always surprises and priorities which service they are focusing on and the help of community was very useful especially this Nothing group
20:00
that we had a meeting with once a month this spring and of course being clever afterwards could we have used the network more? Would we have reached our goal already? Of course the get domain issue was still there
20:21
but I don't think we have been so active with the Transifex project for example and using these email lists that are available relying on our own documentation more or less
20:40
and on this Nothing work. Thanks, that was my presentation. Thank you, thank you so much. I think, I don't know if I can say this but on behalf of all Estonians
21:01
the Finnish Open Data activities are always have been an inspiration of how this has been managed and how much data was already available so thank you. Questions?
21:29
Thanks for the presentation, really interesting. So I was wondering when you applied this XSLT from the 1.3 to the 2.0 metadata record
21:42
don't you then step in the role of the data provider that should be doing that? I mean how is this governance managed? We kind of asked them do they want to do this job themselves
22:00
or do they want us to do it, the small municipalities there were around 100 of them so we got kind of the mandate from them and the benefit we got, we got a lot of users we got awareness of the portal and we also had these meetings
22:23
for those that really wanted to learn themselves so it was not always that we were doing it sometimes they took the ball and did it so we were thinking about it but we had our goal and we improved our documentation and so on
22:44
so kind of we stepped but it was a helping hand and many took it. Anybody else?
23:01
Okay, thank you very much.