A fun way to do spatial cataloguing and publishing using pygeometa and mdme

CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/68424 (DOI)

Publisher

FOSS4G

Release Date

2024

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Metadata, YAML files and pipelines? When I try to convince my colleagues that the approach mentioned in this presentation is fun, they look at me alienated. This presentation will highlight the usage of pygeometa, mdme and DevOps workflow in two projects from different domains of interest. Land-Soil-Crop data ================ ISRIC is endorsing the pygeometa MCF format, a YAML-based representation originally developed as a subset of ISO 19115 metadata, advertised by the pygeometa community as 'Metadata Creation for the Rest of Us'. YAML reads much better then XML, and is optimal for content versioning in Git. But YAML comes with its peculiarities, such as strict indenting and reserved characters. 'Average users should not look at code, instead use shiny (web) interfaces' is a quote often used, but we're not used to reverse the quote: "As a DevOps engineer I hate shiny interfaces. I want to look at code, see the history of that code, who changed what, when, and how can I fix it". This is where the fun part of pygeometa MCF comes in. CI/CD pipelines which run on content changes validate the YAML format and report errors to the submitters. Should we then fully neglect the basic user? Of course not! So we crafted web based forms that generate mcf (osgeo.github.io/mdme) and have import options for Excel sheets (every column is a metadata field). Consider that many data scientists (fortunately) are used to placing a README.md in any project folder. We just ask them to structure the content using YAML. We added an inheritance mechanism, so common properties (contact details, usage constraints) are inserted only once and inherited by lower levels in the folder hierarchy. And embedded metadata is extracted from data files (bounds, projection, format) or online sources. All this metadata is crawled to a central search index (pycsw/pygeoapi/geonetwork). To increase the participatory experience we added 'Edit me on GIT' links to each of the records, which brings users back to the original mcf file to suggest changes. Weather/climate/water metadata ============================== The WMO Information System (WIS2) is the next generation data exchange infrastructure for real-time and archive weather/climate/water data. Discovery metadata is a key component for cataloguing and discovery. An event driven architecture, metadata files are managed on GitHub, which on change, trigger CI/CD workflow to generate compliant WMO discovery metadata, validation and publish to an MQTT broker.

Keywords

foss4ge2024

GeneralTrack

UseCasesApplications