We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

pycsw project status 2021

00:00

Formal Metadata

Title
pycsw project status 2021
Title of Series
Number of Parts
237
Author
Contributors
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
pycsw is an OGC CSW server implementation written in Python and is an official OSGeo Project. pycsw implements clause 10 HTTP protocol binding - Catalogue Services for the Web, CSW of the OpenGIS Catalogue Service Implementation Specification, version 3.0.0 and 2.0.2. pycsw allows for the publishing and discovery of geospatial metadata, providing a standards-based metadata and catalogue component of spatial data infrastructures. The project is certified OGC Compliant, and is an OGC Reference Implementation. The project currently powers numerous high profile catalogues such as US data.gov, geoplatform.gov, IOOS, NGDS, NOAA, US Department of State, US Department of Interior, geodata.gov.gr, Met Norway and WMO WOUDC. This session starts with a status report of the project, followed by an open question answer session to give a chance to users to interact with members of the pycsw project team. This session will cover how the project PSC operates, what is the current project roadmap, and recent enhancements focused on ESA's Earth Observation Exploitation Platform Common Architecture (EOEPCA) and OGC API - Records.
FrustrationSoftware developerProjective planeSystem callMeeting/Interview
Principle of localityWorld Wide Web ConsortiumGeometryUniverse (mathematics)HypermediaProjective planeSoftware maintenancePhysical systemAreaService-oriented architectureOpen sourceWhiteboardSelf-organizationComputer animation
Level (video gaming)GeometryProjective planeSelf-organizationWhiteboardOpen sourceMeasurementComputer animation
ArchitectureMetadataBeta functionHypercubeGame theoryRepository (publishing)Menu (computing)Standard deviationTouchscreenStandard deviationArrow of timeRevision controlElectronic mailing listLibrary catalogService-oriented architectureValidity (statistics)Projective planeReal-time operating systemComputing platformStability theoryRow (database)Profil (magazine)Front and back endsIntegrated development environmentComputer fontComputer architectureBasis <Mathematik>MetadataPlug-in (computing)Software testingInterface (computing)Computer fileImplementationExtension (kinesiology)Open sourceSoftwareCASE <Informatik>Software developerOpen setConfiguration spaceDatabase transactionState observerCartesian coordinate systemServer (computing)BitLatent heatConnectivity (graph theory)Beta functionGrass (card game)Repository (publishing)Noise (electronics)Level (video gaming)Workstation <Musikinstrument>Computer virusWeb 2.0Multiplication signGeometryLimit (category theory)DataflowFile formatDatabaseArchaeological field surveyLocal ringStack (abstract data type)Point (geometry)Video gameTable (information)State of matterPresentation of a groupPulse (signal processing)Goodness of fitWebsiteCodeVotingKey (cryptography)Library (computing)Uniform boundedness principleCausalityMeeting/InterviewSource code
Client (computing)Dependent and independent variablesMaxima and minimaGamma functionComputer architectureServer (computing)Mechanism designFile formatGeometryMusical ensembleService-oriented architectureWeb applicationImplementationOpen sourceSource codeCloud computingArithmetic meanTerm (mathematics)Electric generatorVideo gameSoftware testingPoint (geometry)Social classString (computer science)Query languageDifferent (Kate Ryan album)Figurate numberIntegrated development environmentQuicksortVariable (mathematics)Point cloudConfiguration spaceAreaRow (database)Standard deviationDivisorComputing platformAveragePhysical systemMultiplication signExecution unitExtension (kinesiology)Table (information)Single-precision floating-point formatParameter (computer programming)Software developerGrass (card game)WindowConnected spaceExtreme programmingSoftwareRevision controlProjective planeBitFreewareNormal (geometry)Order (biology)View (database)Client (computing)Open setState observerInternetworkingProxy serverSampling (statistics)Installation artLevel (video gaming)Core dumpRange (statistics)DatabaseSelf-organizationWhiteboard1 (number)Distribution (mathematics)Slide ruleRight angleAbstractionMetadataAdditionFront and back endsMereologyLatent heatBranch (computer science)Web 2.0NumberPlug-in (computing)Profil (magazine)Cartesian coordinate systemLibrary (computing)Temporal logicBoundary value problemEnvelope (mathematics)Functional (mathematics)Computer animation
Gamma functionIntegrated development environmentImplementationWorkstation <Musikinstrument>Computer wormInformation systemsFunction (mathematics)Computer architectureOpen sourceSocial classMoment (mathematics)Data centerWater vaporInstance (computer science)Projective planeLibrary catalogMereologyRow (database)Product (business)Traffic reportingMechanism designPoint cloudWhiteboardMetrePhysical systemDifferent (Kate Ryan album)Self-organizationNumberData managementTouchscreenCodeElectronic mailing listOffice suiteRight angleInternet service providerCategory of beingMultiplication signWechselseitige InformationState observerTerm (mathematics)Computing platformConnectivity (graph theory)Integrated development environmentInterface (computing)Electric generatorOrder (biology)SoftwareQuicksortOpen setFocus (optics)Group actionRational numberFilter <Stochastik>Classical physicsCollaborationismDatabaseFormal languageKeyboard shortcutExtreme programmingLatent heatAbstractionQuery languageVariable (mathematics)Staff (military)Line (geometry)ECosBitShooting methodGreen's functionTrailImplementationPoint (geometry)Standard deviationLevel (video gaming)Exploit (computer security)Landing pageComputer iconInformationSlide ruleUsabilityMathematicsControl flowSoftware bugSystem administratorElasticity (physics)Type theoryLibrary (computing)INTEGRALMetadataProfil (magazine)EmailProof theoryStack (abstract data type)Relational databaseAbstract syntaxGreatest elementComputer wormProper mapCore dump
Server (computing)ImplementationDemo (music)Row (database)Landing pageBranch (computer science)TouchscreenRevision controlDemo (music)Default (computer science)Representation (politics)SummierbarkeitPresentation of a groupView (database)Electric generatorFile viewerComputer animation
CodeSoftware developerMetadataDifferent (Kate Ryan album)Pairwise comparisonProcess (computing)CASE <Informatik>Generic programmingQuicksortWeb applicationConnectivity (graph theory)Library catalogSoftwareUtility softwareProjective planeGeometrySoftware developerComputer programmingMultiplication signMereologySoftware frameworkValidity (statistics)Social classPoint (geometry)Library (computing)Data qualityObservational studyFiber bundleData managementGoodness of fitPhysical systemProgramming languageGame theoryLine (geometry)Product (business)DialectText editorExecution unitDigital photographySoftware testingDemosceneComputer animation
Insertion lossProjective planeMereologyFood energyMeeting/Interview
Transcript: English(auto-generated)
the first session. So, we have the next session about the bi-CSW project status.
Both gentlemen are here, Angelos Tsotsos and Tom Kralidis. Yeah, I already introduced you in the first talk, but some people may have switched, so I'll go quickly. So, Angelos
Tsotsos is a remote sensing researcher, software developer at National Technical University of Athens and OSGEO president. So, ask him anything about OSGEO if you meet him. He's an OGC member, contributor to various projects, and also Ubuntu GIS maintainer. And then we have Tom here
from Canada, and he's a senior systems scientist for the Meteorological Service of Canada. And Tom is very active in various organizations like the OGC, OSGEO as well serving on the board.
But last but not least, various open source geospatial projects implementing international standards like bi-CSW, what you will hear about, and bi-GEO API, Map Server, GEO nodes, QGIS, YWPS, OESlib, too many to mention.
Well, I'll give the floor to you, and you even have hosting rights. So, I saw a screen popping up for sharing. Yes, but yeah, I'm going to share again. I'm not sure if you... Yeah, yeah. Oh, yeah, you share it yourself. So,
that's good. And I leave the floor to you and bring us up to date with the bi-CSW project status. Thank you. Thank you very much. So, hello, everyone. We are happy to be here,
and we are going to present the project status for bi-CSW, the project status for this year. And we are going to start right away. The outline of the presentation, we are going to make an introduction, and then we're going to discuss about the features of bi-CSW, what is new in the latest stable version of bi-CSW. Then we're going to discuss architecture, installation,
downstream projects, and roadmap. So, let's get started with the introduction. So, it's an... Initially, it was an OGC-CSW server implementation in Python, but recently,
we added that we implement OGC API records as well. It's an open source project released under the MIT license, and it runs on all major platforms. It's Python, and it works everywhere. It has been an OSGO project since 2015. So, bi-CSW fully implements the open GIS
service implementation specification, as well as known catalog service for the web. We are now also implementing OGC API records. We are implementing open search, as many other standards for catalogs out there. Bi-CSW basically allows you for publishing
and discovery of geospatial metadata. So, if you have data and you want to publish them online, you create metadata, and then you give the metadata to the bi-CSW to serve using all these standards and specifications. The project has been certified as OGC compliant. It has been an
OGC reference implementation for both CSW 2.0.2 and version 3 recently. It's an official OSGO project for some years now, and we are constantly trying to implement what is new in OGC standards
regarding catalogs. So, a bit of a history for the project. Tom started the project in 2010. He was alone for a year coding, and then he announced the project in 2011. It's been 10 years
now. Then I joined him, and we started working on the first official release, 0.1, passing all the side tests for OGC CSW. Then, shortly after we released version 1, we included the software in OSGO live, and then we moved on. We powered data.gov at some point. We graduated the incubation
in 2015. Then, we did a reference implementation of OGC CSW 3. Last December, we had the 10th birthday, and shortly after, we released the latest stable version 2.6. Tom is going to talk about it in a
while, what is new in there. The recent development is that we landed the support for OGC API records and stack this July. So, we have new things in master that are not yet released in a stable version. The goals of the project, we want to have a lightweight and easy to use setup.
It's a standalone catalog, but it doesn't have a UI. Well, it didn't have a UI until now that we have OGC API records. There's no metadata editing front end, but it is designed as a microservice. It serves the use case of exposing ready-to-go metadata. If you have files or an existing
database, you can serve this through a CSW interface. It's very extensible. It's easy to add metadata formats, mappings, and it has a very easy to extend architecture. It is always
OGC compliant. We make sure that we always pass the side tests and it's always passing the side tests on a daily basis. So, a bit of discussion about the features. As I already said, we implement fully and we are the reference implementation for CSW, all recent versions. We support harvesting
for WMS, WFS, WCS, and many other OGC standards. We implement the ISO application profile and the FGDC-CSD-GM application profile, as well as OpenSearch, GeoTime, and recently EO extensions, the Earth Observations extensions for OGC or OpenSearch. OGC API records
core. We implement that, but it's still in development. The standard is not officially out yet. We are implementing the current stack API, which is version 1 beta 2,
and various other standards like Dublin Core, DIF, Atom, and many more. We also support transactional capability, so you can do transactions with a catalog, and we have a flexible repository configuration. It works with SQLite, Postgres, PostGIS, MySQL, anything that can plug into
SQLAlchemy or even to Django in some cases like Geonode. It supports federated catalog distributed searching, so you can make a network of five CSW or you can plug it in and harvest
from other catalogs and it will work. It's very simple to configure. It has an extensible plugin architecture. You can create your own plugins or add new backends to it. It can integrate because it's basically a library, so it can integrate with Python environments. For example, Geonode uses five CSW as the catalog component along some other projects.
It has been integrated with CCAN in the past, and we have many more features like we implement full text search and we have real-time XML validation. This is a list of standards that
we currently support. I won't go through all of them, but it's all about metadata, and metadata can have too many standards, but yeah, here we are. So I'm going to hand it over to Tom to talk about the latest by CSW, Tom.
Great. Thanks, Angelo. Maybe can we just arrow over horizontally to the next slide? Excellent. Next slide. Down. Oh, sorry. Yeah, one more down. Okay, so what's new in version 2.6?
Sorry, I had some network connection issue, but version 2.6, what have we done?
We've added support for OpenSearch, Geo, and time enhancements. The way we've done temporal support in PyC7 in the past has always been on a single date time, but now we're supporting a temporal envelope. So if you're looking for a metadata record with a time extent, the temporal boundary that's now supported in the OpenSearch, Geo, and time away.
Next slide. We also support 12 factors, so things like environment variables in the configuration. So if you have a, let's say, a Docker setup or a Docker Compose setup or some cloud capability,
you're able to set any PyCSW configuration variable, configuration value as an environment variable, and that will make its way through. And that's really great when you're working in environments where there's a lot of different servers and nodes and so on and so forth. So it's flexible that way. We've also added Kubernetes support,
as well as Helm charts, and we finally removed support for Python 2. So if you are using Python, PyCSW 2.6 Python 2 support is officially removed. It was always, it was deprecated for a while and we turned off all of our CI testing for Python 2, but
we finally removed it in 2.6. Next slide. That's what's happening in 2.6, what's going on in our development branch. So we're working on supporting the OGC open search earth observation extension. So those are specific open search parameters, such as cloud cover,
and so on, platforms and instruments to support the EO profile. We've done an early implementation of OGC API records part one, which is the core. And as Angelos may have mentioned, both Angelos and I are on the OGC API standards working group,
records standards working group. So with that, we've done the implementation in PyCSW. We've also done an earlier implementation of a stack API for discovering items. And with all that, we added an extra, an additional WSGI endpoint using Flask. The way
CSW was working before these improvements was through a single WSGI with a single endpoint, which was the sort of the CSWA where you have a query string taking on most of the work. But with the new approaches, we implemented the endpoint using Flask with Flask routes,
which map back to which work with the existing functionality. So that was a nice enhancement to put on top of our existing functionality to make it work with the new generation of standards. Next slide. In terms of architecture, we, as Angelos mentioned, I mean, it's a microservice, it's a microservice architecture,
and it's quite modular. So the sole goal of PyCSW is to serve, is to metadata management, whether you want to harvest metadata into PyCSW or add push metadata into PyCSW and make that
same metadata available to downstream clients or applications. So downstream clients and applications include desktop GISs like QGIS, client libraries like owslib, or web applications like Geonode. And on the other side of that, we have a plugin mechanism that allows you to
support one to many different metadata formats. The ones that you see there are specific to what we have on board in PyCSW where you can make your own external plugins for, you know, metadata that you may not see there, which is important to your organization or your project. Next slide. And there's another example. We also support a number of different backends.
So we use SQLAlchemy for our database abstraction. So you can abstract databases, and we're also working on support of abstracting backends so we can start supporting NoSQL backends. There is some loose support for Elasticsearch, and I've seen other
projects implement that. We're going to put that into PyCSW core, but the capability does exist. We've abstracted things enough so that we can support both NoSQL and a wide range of RDBMSs through SQLAlchemy. Next slide.
All right. Thank you, Tom. So let's see how easy it is to install PyCSW. So it's really simple because it's already in PyPy. So if you do a pip install PyCSW, you will get PyCSW as a library.
It's already included for many years now in Debian, so it's in the non-free area due to the OGC schema licensing, but the actual package is completely open source and free. Through Debian, it became available upstream in Ubuntu and also in OpenSUSE, so many distributions
now have PyCSW upstream. It's available for Windows in the map server for Windows packaging, and the source code is already available on GitHub. It's really easy to install. It's a four-minute install, and believe me, we have already timed this. It's really four minutes to
install it and run it locally, and it's even more easy nowadays with Docker. If you have OSGOLive, you already have it in there, so it's available since version 5.5, and it's already in the latest stable version. So if you have OSGOLive, you can run PyCSW directly there.
There's an overview and a quick start tutorial if you're interested. But recently, we are also doing packaging in Docker, so it's already available in Docker Hub. You can just do a Docker pull. The latest version of PyCSW or any other release version, then it's less than four
minutes actually to set it up. We added recently some sample Kubernetes configuration. It's available on our GitHub repo, and we have a Helm chart, so if you are creating your own architecture and you want to use PyCSW in that architecture, you have all the tools to actually
set up a microservice in Kubernetes or maybe in Docker Swarm or whatever else you're using. It's really easy to set up and very convenient. Let's see a bit about recent projects and deployments. So recently, we have been working on an ESA project, the ESA Earth Observation
Exploitation Platform in Common Architecture. It's called EO-EPCA. It's an exploitation platform. ESA is funding this project. It's a collaborative virtual work environment providing access to Earth observation data, algorithms, tools, and ICT resources. The goal there is to define a
usable exploitation platform architecture using open interfaces, but basically it's also doing this through open source software. If you go to eoepca.org, you will see that the architecture is 100% open source. It's on GitHub. PyCSW is the resource catalog component of this
architecture, and we are building more features to PyCSW as requested by ESA in order to accommodate the needs of this common architecture. We are hoping that the next generation of Earth observation platforms in Europe will have PyCSW as a catalog. Back to Tom. Thanks, Angelos. A few more example projects. So one is from
the Norwegian Meteorological Institute, who've been extensively using the project for a number of different metadata records, metadata management in support of their projects with the World
Meteorological Organization, as well as Norwegian Marine Data Center. So they have a number of projects which are using PyCSW to manage mostly ISO metadata from what I've seen, but they also have a heavy use of PyCSW in a cloud environment. So kudos to Norwegian
Meteorological Institute for bug reports and adding feature enhancements and so on. Next slide. They have some ongoing work around working on output profiles, and they do continuous deployment again through Kubernetes and cloud mechanisms. Next slide. In my organization,
we're working on something with the WMO as part of the WMO information system. So we're migrating our existing data collection and production center catalog into a PyCSW instance, which basically will provide the WMO core metadata profile, which is a profile of ISO
19115, and that basically provides discovery and search for weather, climate, and water data of WMO member data. So that's using PyCSW as well. Next slide. And road map. So where are we going? We've done quite a bit, a lot of focus on Earth observation as well as the
on the OGC API records trail to make sure that that's implemented and implemented properly. Our goal is to become a reference implementation just like we are for CSW. We are going to add support for common query language or CQL, both as text and as a JSON HTTP
post payload. So that's in scope. Next slide. Stack. We're extending our stack support and seeing how some of the mechanisms around stack will be ratified, particularly around
query and filters. I think there's some convergence between stack and OGC API in terms of query languages. So we're going to see how that bottoms out, and we continue to test against PyStack Client, which is a great tool for testing the specification. Next slide. Coming soon, deeper JSON metadata management. So we mentioned stack and OGC API records.
JSON is one of the first class outputs that we provide on output when people ask for metadata or query and we search and present metadata. But we also want to support ingest of these new JSON-based metadata standards. So you can harvest the stack API into PyCSW.
You can harvest another OGC API records endpoint into PyCSW. So we want to, I mean, that's a critical path. We've traditionally done only XML, but at this point, we're extending that onto JSON. And while we're at it, we will probably abstract that enough so that the
underlying support can be for any encoding so we can future proof that a little bit more. But we'll start with JSON as the first example. Deeper EO support. So doing queries around more EO sort of queryables and facets as well as granularity. That's a big piece of work
around doing cataloging for Earth observation, but that's in scope. PyGeofilter integration. So PyGeofilter is a new project that is an abstraction. It's a Python library that is an abstraction around all the various OGC filter common query language type
specifications. And it will allow your Python project to basically use PyGeofilter to parse a filter, put that in an abstract syntax tree, and then bind that to your backend, whether you have a relational database or elastic search or something else. So we want to integrate with PyGeofilter. I think what that'll do for
the PyCSW project is we'll be able to rip out all the custom code that we've done for filters over the years, which work well. But we want to merge our efforts with PyGeofilter to put the focus on that capability from that project. And we work closely with
the PyGeofilter folks, and PyGeofilter is another GeoPython project. So I encourage folks to take a look at that. And finally, our PyCSW admin.py tool will be updated to use CLI tooling and an updated CLI interface using Python clicks. So look forward to that one.
Future releases. So by cutting PyCSW 3.0, that will be our long-term release, supporting CSW 2 and 3 over time. So that will be bound for CSW 2 and 3. PyCSW 4 will properly support and may provide some breaking changes on only supporting ODC API records,
which used to be called Catalog 4, but it's now called ODC API records. And we continue to evolve and work together with the PyGeo API project, which Angeles and I and others work on to articulate what the relationship is between those. I think there is a small
metadata capability in PyGeo API. In the PyCSW, we have a more full-featured capability to do actual metadata management. And that continues to be the goal of the project and provide these APIs just the same. Next slide. Community. Getting involved. So we're an open community. We have
mailing lists, Gitter channels. We're on GitHub. There's professional support if folks are interested. Next slide. And that is it. So I would like to thank everybody and we'll take any
questions or comments. Angeles, I think you have a demo up there on the screen now. Yeah, I just wanted to show if we have one minute, what is the landing page of PyCSW. And this is what is now on PyCSW 3, which is in master branch. So you have the endpoints listed here. You have the CSW 3 here, the CSW version 2 here. And this is the landing page for OGC API
records. So this is the JSON representation of that. And if you go to collections and you go to items, you can see the actual records on a simple UI. So yeah, we didn't mean to have a UI,
but OGC API records now has HTTP as default. So we are now in this situation where we support all these features. And this is the new face of PyCSW 3. That's it. Okay. Thanks, Angeles and Tom, bringing us up to date on PyCSW. In the meantime, we got some
interesting questions. Can you guess what the first question is? It's very similar as the question we got for PyQ API. So I read it. How do you compare use cases
of PyCSW to GU network? That was the most upvoted question. So I start with that one. Okay, maybe I'll take a shot. So in my opinion, other than the fact that they're written in different programming languages, but for use cases, I would say the PyCSW use case is more the
composable or headless use case, so that you can build it into your own Python pipelines or other microservice based pipelines. I know Geo network has a fully blown metadata editor, which is very powerful. That was never the scope of PyCSW. So the
assumption of PyCSW is, you know, somewhere along the line, you are composing your metadata, whether it's sort of at your desk or whether it's through some sort of pipeline. And then that feeds its way through PyCSW. So PyCSW puts your metadata on the shelf, assuming that
you've created that through some upstream process. I think we should compare Geo network more to GeoNode, which is the next talk from Alessio. And PyCSW works as the catalog component there. So if you want a full UI with everything bundled in, I think GeoNode is what you're
looking for in Python, in the Python world. Okay, thank you. And there's another question. I read it. Can you provide an example of what we pronounce it? Whiskey is good for WSTI.
And maybe that's more a sort of generic Python question or? Yeah, I think it's basically a convention for connecting to web applications or frameworks written in Python. So at least that's probably maybe the questioner is hinting at sort of the
follow up, which we are actually using, I think, with PyG API, which is called ASCII, the asynchronous things like starlet. And probably the question is maybe are you planning
support for, because PyCSW, is it Flask based? The new work is based on Flask and Flask 2.0 supports async just the same, so sure. Oh, okay. So let's see. Oh yeah, there's a longer question.
It's about can you test the quality of metadata and develop infrastructure harvesting different CSWs, but want to test our metadata quality to inspire?
I can talk to that one directly because I actually do that in a few of our projects. So I mentioned the project that we're doing for the WMO here in Canada, and we do quality assessment of our metadata, but that is done upstream of PyCSW. So imagine we have a pipeline
where we create metadata and the metadata is sent along to a quality assessment utility that is totally outside of PyCSW. And then once it's deemed that it's ready to be published and it meets the quality assessment criteria, then it's fed through PyCSW. So I would see that as sort of an upstream workflow. I'm not sure if Angelos has any comments.
I just want to add that it's possible to do validation when you do harvesting, and it's possible to validate metadata given if you have an xld to test against. So libxml is
part of what we are using, and lxml, the Python library, is there. So it's possible for somebody to actually implement that. Yeah. Is that maybe also a role for GWL check here?
Sure. That will be in a talk tomorrow. Well, we're just in time here, and our next speaker is waiting backstage. And thanks again, Tom, Angelos,
and also for your work on PyCSW and many of the other GeoPython projects. And we'll see you along. Thank you very much.