pycsw project status 2021
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 237 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/57233 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSS4G Argentina 202123 / 237
12
15
16
23
26
36
44
52
53
54
59
72
90
99
114
121
122
123
124
127
129
130
139
154
155
166
203
204
219
223
224
230
00:00
FrustrationSoftware developerProjective planeSystem callMeeting/Interview
00:42
Principle of localityWorld Wide Web ConsortiumGeometryUniverse (mathematics)HypermediaProjective planeSoftware maintenancePhysical systemAreaService-oriented architectureOpen sourceWhiteboardSelf-organizationComputer animation
01:09
Level (video gaming)GeometryProjective planeSelf-organizationWhiteboardOpen sourceMeasurementComputer animation
01:38
ArchitectureMetadataBeta functionHypercubeGame theoryRepository (publishing)Menu (computing)Standard deviationTouchscreenStandard deviationArrow of timeRevision controlElectronic mailing listLibrary catalogService-oriented architectureValidity (statistics)Projective planeReal-time operating systemComputing platformStability theoryRow (database)Profil (magazine)Front and back endsIntegrated development environmentComputer fontComputer architectureBasis <Mathematik>MetadataPlug-in (computing)Software testingInterface (computing)Computer fileImplementationExtension (kinesiology)Open sourceSoftwareCASE <Informatik>Software developerOpen setConfiguration spaceDatabase transactionState observerCartesian coordinate systemServer (computing)BitLatent heatConnectivity (graph theory)Beta functionGrass (card game)Repository (publishing)Noise (electronics)Level (video gaming)Workstation <Musikinstrument>Computer virusWeb 2.0Multiplication signGeometryLimit (category theory)DataflowFile formatDatabaseArchaeological field surveyLocal ringStack (abstract data type)Point (geometry)Video gameTable (information)State of matterPresentation of a groupPulse (signal processing)Goodness of fitWebsiteCodeVotingKey (cryptography)Library (computing)Uniform boundedness principleCausalityMeeting/InterviewSource code
09:17
Client (computing)Dependent and independent variablesMaxima and minimaGamma functionComputer architectureServer (computing)Mechanism designFile formatGeometryMusical ensembleService-oriented architectureWeb applicationImplementationOpen sourceSource codeCloud computingArithmetic meanTerm (mathematics)Electric generatorVideo gameSoftware testingPoint (geometry)Social classString (computer science)Query languageDifferent (Kate Ryan album)Figurate numberIntegrated development environmentQuicksortVariable (mathematics)Point cloudConfiguration spaceAreaRow (database)Standard deviationDivisorComputing platformAveragePhysical systemMultiplication signExecution unitExtension (kinesiology)Table (information)Single-precision floating-point formatParameter (computer programming)Software developerGrass (card game)WindowConnected spaceExtreme programmingSoftwareRevision controlProjective planeBitFreewareNormal (geometry)Order (biology)View (database)Client (computing)Open setState observerInternetworkingProxy serverSampling (statistics)Installation artLevel (video gaming)Core dumpRange (statistics)DatabaseSelf-organizationWhiteboard1 (number)Distribution (mathematics)Slide ruleRight angleAbstractionMetadataAdditionFront and back endsMereologyLatent heatBranch (computer science)Web 2.0NumberPlug-in (computing)Profil (magazine)Cartesian coordinate systemLibrary (computing)Temporal logicBoundary value problemEnvelope (mathematics)Functional (mathematics)Computer animation
16:38
Gamma functionIntegrated development environmentImplementationWorkstation <Musikinstrument>Computer wormInformation systemsFunction (mathematics)Computer architectureOpen sourceSocial classMoment (mathematics)Data centerWater vaporInstance (computer science)Projective planeLibrary catalogMereologyRow (database)Product (business)Traffic reportingMechanism designPoint cloudWhiteboardMetrePhysical systemDifferent (Kate Ryan album)Self-organizationNumberData managementTouchscreenCodeElectronic mailing listOffice suiteRight angleInternet service providerCategory of beingMultiplication signWechselseitige InformationState observerTerm (mathematics)Computing platformConnectivity (graph theory)Integrated development environmentInterface (computing)Electric generatorOrder (biology)SoftwareQuicksortOpen setFocus (optics)Group actionRational numberFilter <Stochastik>Classical physicsCollaborationismDatabaseFormal languageKeyboard shortcutExtreme programmingLatent heatAbstractionQuery languageVariable (mathematics)Staff (military)Line (geometry)ECosBitShooting methodGreen's functionTrailImplementationPoint (geometry)Standard deviationLevel (video gaming)Exploit (computer security)Landing pageComputer iconInformationSlide ruleUsabilityMathematicsControl flowSoftware bugSystem administratorElasticity (physics)Type theoryLibrary (computing)INTEGRALMetadataProfil (magazine)EmailProof theoryStack (abstract data type)Relational databaseAbstract syntaxGreatest elementComputer wormProper mapCore dump
23:59
Server (computing)ImplementationDemo (music)Row (database)Landing pageBranch (computer science)TouchscreenRevision controlDemo (music)Default (computer science)Representation (politics)SummierbarkeitPresentation of a groupView (database)Electric generatorFile viewerComputer animation
25:00
CodeSoftware developerMetadataDifferent (Kate Ryan album)Pairwise comparisonProcess (computing)CASE <Informatik>Generic programmingQuicksortWeb applicationConnectivity (graph theory)Library catalogSoftwareUtility softwareProjective planeGeometrySoftware developerComputer programmingMultiplication signMereologySoftware frameworkValidity (statistics)Social classPoint (geometry)Library (computing)Data qualityObservational studyFiber bundleData managementGoodness of fitPhysical systemProgramming languageGame theoryLine (geometry)Product (business)DialectText editorExecution unitDigital photographySoftware testingDemosceneComputer animation
29:58
Insertion lossProjective planeMereologyFood energyMeeting/Interview
Transcript: English(auto-generated)
00:14
the first session. So, we have the next session about the bi-CSW project status.
00:25
Both gentlemen are here, Angelos Tsotsos and Tom Kralidis. Yeah, I already introduced you in the first talk, but some people may have switched, so I'll go quickly. So, Angelos
00:41
Tsotsos is a remote sensing researcher, software developer at National Technical University of Athens and OSGEO president. So, ask him anything about OSGEO if you meet him. He's an OGC member, contributor to various projects, and also Ubuntu GIS maintainer. And then we have Tom here
01:04
from Canada, and he's a senior systems scientist for the Meteorological Service of Canada. And Tom is very active in various organizations like the OGC, OSGEO as well serving on the board.
01:20
But last but not least, various open source geospatial projects implementing international standards like bi-CSW, what you will hear about, and bi-GEO API, Map Server, GEO nodes, QGIS, YWPS, OESlib, too many to mention.
01:44
Well, I'll give the floor to you, and you even have hosting rights. So, I saw a screen popping up for sharing. Yes, but yeah, I'm going to share again. I'm not sure if you... Yeah, yeah. Oh, yeah, you share it yourself. So,
02:05
that's good. And I leave the floor to you and bring us up to date with the bi-CSW project status. Thank you. Thank you very much. So, hello, everyone. We are happy to be here,
02:21
and we are going to present the project status for bi-CSW, the project status for this year. And we are going to start right away. The outline of the presentation, we are going to make an introduction, and then we're going to discuss about the features of bi-CSW, what is new in the latest stable version of bi-CSW. Then we're going to discuss architecture, installation,
02:47
downstream projects, and roadmap. So, let's get started with the introduction. So, it's an... Initially, it was an OGC-CSW server implementation in Python, but recently,
03:01
we added that we implement OGC API records as well. It's an open source project released under the MIT license, and it runs on all major platforms. It's Python, and it works everywhere. It has been an OSGO project since 2015. So, bi-CSW fully implements the open GIS
03:23
service implementation specification, as well as known catalog service for the web. We are now also implementing OGC API records. We are implementing open search, as many other standards for catalogs out there. Bi-CSW basically allows you for publishing
03:45
and discovery of geospatial metadata. So, if you have data and you want to publish them online, you create metadata, and then you give the metadata to the bi-CSW to serve using all these standards and specifications. The project has been certified as OGC compliant. It has been an
04:05
OGC reference implementation for both CSW 2.0.2 and version 3 recently. It's an official OSGO project for some years now, and we are constantly trying to implement what is new in OGC standards
04:26
regarding catalogs. So, a bit of a history for the project. Tom started the project in 2010. He was alone for a year coding, and then he announced the project in 2011. It's been 10 years
04:41
now. Then I joined him, and we started working on the first official release, 0.1, passing all the side tests for OGC CSW. Then, shortly after we released version 1, we included the software in OSGO live, and then we moved on. We powered data.gov at some point. We graduated the incubation
05:08
in 2015. Then, we did a reference implementation of OGC CSW 3. Last December, we had the 10th birthday, and shortly after, we released the latest stable version 2.6. Tom is going to talk about it in a
05:24
while, what is new in there. The recent development is that we landed the support for OGC API records and stack this July. So, we have new things in master that are not yet released in a stable version. The goals of the project, we want to have a lightweight and easy to use setup.
05:45
It's a standalone catalog, but it doesn't have a UI. Well, it didn't have a UI until now that we have OGC API records. There's no metadata editing front end, but it is designed as a microservice. It serves the use case of exposing ready-to-go metadata. If you have files or an existing
06:05
database, you can serve this through a CSW interface. It's very extensible. It's easy to add metadata formats, mappings, and it has a very easy to extend architecture. It is always
06:22
OGC compliant. We make sure that we always pass the side tests and it's always passing the side tests on a daily basis. So, a bit of discussion about the features. As I already said, we implement fully and we are the reference implementation for CSW, all recent versions. We support harvesting
06:43
for WMS, WFS, WCS, and many other OGC standards. We implement the ISO application profile and the FGDC-CSD-GM application profile, as well as OpenSearch, GeoTime, and recently EO extensions, the Earth Observations extensions for OGC or OpenSearch. OGC API records
07:08
core. We implement that, but it's still in development. The standard is not officially out yet. We are implementing the current stack API, which is version 1 beta 2,
07:22
and various other standards like Dublin Core, DIF, Atom, and many more. We also support transactional capability, so you can do transactions with a catalog, and we have a flexible repository configuration. It works with SQLite, Postgres, PostGIS, MySQL, anything that can plug into
07:46
SQLAlchemy or even to Django in some cases like Geonode. It supports federated catalog distributed searching, so you can make a network of five CSW or you can plug it in and harvest
08:02
from other catalogs and it will work. It's very simple to configure. It has an extensible plugin architecture. You can create your own plugins or add new backends to it. It can integrate because it's basically a library, so it can integrate with Python environments. For example, Geonode uses five CSW as the catalog component along some other projects.
08:27
It has been integrated with CCAN in the past, and we have many more features like we implement full text search and we have real-time XML validation. This is a list of standards that
08:43
we currently support. I won't go through all of them, but it's all about metadata, and metadata can have too many standards, but yeah, here we are. So I'm going to hand it over to Tom to talk about the latest by CSW, Tom.
09:04
Great. Thanks, Angelo. Maybe can we just arrow over horizontally to the next slide? Excellent. Next slide. Down. Oh, sorry. Yeah, one more down. Okay, so what's new in version 2.6?
09:36
Sorry, I had some network connection issue, but version 2.6, what have we done?
09:42
We've added support for OpenSearch, Geo, and time enhancements. The way we've done temporal support in PyC7 in the past has always been on a single date time, but now we're supporting a temporal envelope. So if you're looking for a metadata record with a time extent, the temporal boundary that's now supported in the OpenSearch, Geo, and time away.
10:05
Next slide. We also support 12 factors, so things like environment variables in the configuration. So if you have a, let's say, a Docker setup or a Docker Compose setup or some cloud capability,
10:21
you're able to set any PyCSW configuration variable, configuration value as an environment variable, and that will make its way through. And that's really great when you're working in environments where there's a lot of different servers and nodes and so on and so forth. So it's flexible that way. We've also added Kubernetes support,
10:43
as well as Helm charts, and we finally removed support for Python 2. So if you are using Python, PyCSW 2.6 Python 2 support is officially removed. It was always, it was deprecated for a while and we turned off all of our CI testing for Python 2, but
11:03
we finally removed it in 2.6. Next slide. That's what's happening in 2.6, what's going on in our development branch. So we're working on supporting the OGC open search earth observation extension. So those are specific open search parameters, such as cloud cover,
11:24
and so on, platforms and instruments to support the EO profile. We've done an early implementation of OGC API records part one, which is the core. And as Angelos may have mentioned, both Angelos and I are on the OGC API standards working group,
11:41
records standards working group. So with that, we've done the implementation in PyCSW. We've also done an earlier implementation of a stack API for discovering items. And with all that, we added an extra, an additional WSGI endpoint using Flask. The way
12:03
CSW was working before these improvements was through a single WSGI with a single endpoint, which was the sort of the CSWA where you have a query string taking on most of the work. But with the new approaches, we implemented the endpoint using Flask with Flask routes,
12:23
which map back to which work with the existing functionality. So that was a nice enhancement to put on top of our existing functionality to make it work with the new generation of standards. Next slide. In terms of architecture, we, as Angelos mentioned, I mean, it's a microservice, it's a microservice architecture,
12:44
and it's quite modular. So the sole goal of PyCSW is to serve, is to metadata management, whether you want to harvest metadata into PyCSW or add push metadata into PyCSW and make that
13:02
same metadata available to downstream clients or applications. So downstream clients and applications include desktop GISs like QGIS, client libraries like owslib, or web applications like Geonode. And on the other side of that, we have a plugin mechanism that allows you to
13:22
support one to many different metadata formats. The ones that you see there are specific to what we have on board in PyCSW where you can make your own external plugins for, you know, metadata that you may not see there, which is important to your organization or your project. Next slide. And there's another example. We also support a number of different backends.
13:46
So we use SQLAlchemy for our database abstraction. So you can abstract databases, and we're also working on support of abstracting backends so we can start supporting NoSQL backends. There is some loose support for Elasticsearch, and I've seen other
14:02
projects implement that. We're going to put that into PyCSW core, but the capability does exist. We've abstracted things enough so that we can support both NoSQL and a wide range of RDBMSs through SQLAlchemy. Next slide.
14:23
All right. Thank you, Tom. So let's see how easy it is to install PyCSW. So it's really simple because it's already in PyPy. So if you do a pip install PyCSW, you will get PyCSW as a library.
14:42
It's already included for many years now in Debian, so it's in the non-free area due to the OGC schema licensing, but the actual package is completely open source and free. Through Debian, it became available upstream in Ubuntu and also in OpenSUSE, so many distributions
15:02
now have PyCSW upstream. It's available for Windows in the map server for Windows packaging, and the source code is already available on GitHub. It's really easy to install. It's a four-minute install, and believe me, we have already timed this. It's really four minutes to
15:23
install it and run it locally, and it's even more easy nowadays with Docker. If you have OSGOLive, you already have it in there, so it's available since version 5.5, and it's already in the latest stable version. So if you have OSGOLive, you can run PyCSW directly there.
15:44
There's an overview and a quick start tutorial if you're interested. But recently, we are also doing packaging in Docker, so it's already available in Docker Hub. You can just do a Docker pull. The latest version of PyCSW or any other release version, then it's less than four
16:05
minutes actually to set it up. We added recently some sample Kubernetes configuration. It's available on our GitHub repo, and we have a Helm chart, so if you are creating your own architecture and you want to use PyCSW in that architecture, you have all the tools to actually
16:24
set up a microservice in Kubernetes or maybe in Docker Swarm or whatever else you're using. It's really easy to set up and very convenient. Let's see a bit about recent projects and deployments. So recently, we have been working on an ESA project, the ESA Earth Observation
16:45
Exploitation Platform in Common Architecture. It's called EO-EPCA. It's an exploitation platform. ESA is funding this project. It's a collaborative virtual work environment providing access to Earth observation data, algorithms, tools, and ICT resources. The goal there is to define a
17:03
usable exploitation platform architecture using open interfaces, but basically it's also doing this through open source software. If you go to eoepca.org, you will see that the architecture is 100% open source. It's on GitHub. PyCSW is the resource catalog component of this
17:22
architecture, and we are building more features to PyCSW as requested by ESA in order to accommodate the needs of this common architecture. We are hoping that the next generation of Earth observation platforms in Europe will have PyCSW as a catalog. Back to Tom. Thanks, Angelos. A few more example projects. So one is from
17:49
the Norwegian Meteorological Institute, who've been extensively using the project for a number of different metadata records, metadata management in support of their projects with the World
18:01
Meteorological Organization, as well as Norwegian Marine Data Center. So they have a number of projects which are using PyCSW to manage mostly ISO metadata from what I've seen, but they also have a heavy use of PyCSW in a cloud environment. So kudos to Norwegian
18:23
Meteorological Institute for bug reports and adding feature enhancements and so on. Next slide. They have some ongoing work around working on output profiles, and they do continuous deployment again through Kubernetes and cloud mechanisms. Next slide. In my organization,
18:45
we're working on something with the WMO as part of the WMO information system. So we're migrating our existing data collection and production center catalog into a PyCSW instance, which basically will provide the WMO core metadata profile, which is a profile of ISO
19:02
19115, and that basically provides discovery and search for weather, climate, and water data of WMO member data. So that's using PyCSW as well. Next slide. And road map. So where are we going? We've done quite a bit, a lot of focus on Earth observation as well as the
19:28
on the OGC API records trail to make sure that that's implemented and implemented properly. Our goal is to become a reference implementation just like we are for CSW. We are going to add support for common query language or CQL, both as text and as a JSON HTTP
19:46
post payload. So that's in scope. Next slide. Stack. We're extending our stack support and seeing how some of the mechanisms around stack will be ratified, particularly around
20:01
query and filters. I think there's some convergence between stack and OGC API in terms of query languages. So we're going to see how that bottoms out, and we continue to test against PyStack Client, which is a great tool for testing the specification. Next slide. Coming soon, deeper JSON metadata management. So we mentioned stack and OGC API records.
20:24
JSON is one of the first class outputs that we provide on output when people ask for metadata or query and we search and present metadata. But we also want to support ingest of these new JSON-based metadata standards. So you can harvest the stack API into PyCSW.
20:43
You can harvest another OGC API records endpoint into PyCSW. So we want to, I mean, that's a critical path. We've traditionally done only XML, but at this point, we're extending that onto JSON. And while we're at it, we will probably abstract that enough so that the
21:01
underlying support can be for any encoding so we can future proof that a little bit more. But we'll start with JSON as the first example. Deeper EO support. So doing queries around more EO sort of queryables and facets as well as granularity. That's a big piece of work
21:24
around doing cataloging for Earth observation, but that's in scope. PyGeofilter integration. So PyGeofilter is a new project that is an abstraction. It's a Python library that is an abstraction around all the various OGC filter common query language type
21:44
specifications. And it will allow your Python project to basically use PyGeofilter to parse a filter, put that in an abstract syntax tree, and then bind that to your backend, whether you have a relational database or elastic search or something else. So we want to integrate with PyGeofilter. I think what that'll do for
22:04
the PyCSW project is we'll be able to rip out all the custom code that we've done for filters over the years, which work well. But we want to merge our efforts with PyGeofilter to put the focus on that capability from that project. And we work closely with
22:21
the PyGeofilter folks, and PyGeofilter is another GeoPython project. So I encourage folks to take a look at that. And finally, our PyCSW admin.py tool will be updated to use CLI tooling and an updated CLI interface using Python clicks. So look forward to that one.
22:41
Future releases. So by cutting PyCSW 3.0, that will be our long-term release, supporting CSW 2 and 3 over time. So that will be bound for CSW 2 and 3. PyCSW 4 will properly support and may provide some breaking changes on only supporting ODC API records,
23:06
which used to be called Catalog 4, but it's now called ODC API records. And we continue to evolve and work together with the PyGeo API project, which Angeles and I and others work on to articulate what the relationship is between those. I think there is a small
23:28
metadata capability in PyGeo API. In the PyCSW, we have a more full-featured capability to do actual metadata management. And that continues to be the goal of the project and provide these APIs just the same. Next slide. Community. Getting involved. So we're an open community. We have
23:47
mailing lists, Gitter channels. We're on GitHub. There's professional support if folks are interested. Next slide. And that is it. So I would like to thank everybody and we'll take any
24:03
questions or comments. Angeles, I think you have a demo up there on the screen now. Yeah, I just wanted to show if we have one minute, what is the landing page of PyCSW. And this is what is now on PyCSW 3, which is in master branch. So you have the endpoints listed here. You have the CSW 3 here, the CSW version 2 here. And this is the landing page for OGC API
24:29
records. So this is the JSON representation of that. And if you go to collections and you go to items, you can see the actual records on a simple UI. So yeah, we didn't mean to have a UI,
24:42
but OGC API records now has HTTP as default. So we are now in this situation where we support all these features. And this is the new face of PyCSW 3. That's it. Okay. Thanks, Angeles and Tom, bringing us up to date on PyCSW. In the meantime, we got some
25:06
interesting questions. Can you guess what the first question is? It's very similar as the question we got for PyQ API. So I read it. How do you compare use cases
25:22
of PyCSW to GU network? That was the most upvoted question. So I start with that one. Okay, maybe I'll take a shot. So in my opinion, other than the fact that they're written in different programming languages, but for use cases, I would say the PyCSW use case is more the
25:45
composable or headless use case, so that you can build it into your own Python pipelines or other microservice based pipelines. I know Geo network has a fully blown metadata editor, which is very powerful. That was never the scope of PyCSW. So the
26:05
assumption of PyCSW is, you know, somewhere along the line, you are composing your metadata, whether it's sort of at your desk or whether it's through some sort of pipeline. And then that feeds its way through PyCSW. So PyCSW puts your metadata on the shelf, assuming that
26:22
you've created that through some upstream process. I think we should compare Geo network more to GeoNode, which is the next talk from Alessio. And PyCSW works as the catalog component there. So if you want a full UI with everything bundled in, I think GeoNode is what you're
26:43
looking for in Python, in the Python world. Okay, thank you. And there's another question. I read it. Can you provide an example of what we pronounce it? Whiskey is good for WSTI.
27:02
And maybe that's more a sort of generic Python question or? Yeah, I think it's basically a convention for connecting to web applications or frameworks written in Python. So at least that's probably maybe the questioner is hinting at sort of the
27:27
follow up, which we are actually using, I think, with PyG API, which is called ASCII, the asynchronous things like starlet. And probably the question is maybe are you planning
27:43
support for, because PyCSW, is it Flask based? The new work is based on Flask and Flask 2.0 supports async just the same, so sure. Oh, okay. So let's see. Oh yeah, there's a longer question.
28:05
It's about can you test the quality of metadata and develop infrastructure harvesting different CSWs, but want to test our metadata quality to inspire?
28:21
I can talk to that one directly because I actually do that in a few of our projects. So I mentioned the project that we're doing for the WMO here in Canada, and we do quality assessment of our metadata, but that is done upstream of PyCSW. So imagine we have a pipeline
28:40
where we create metadata and the metadata is sent along to a quality assessment utility that is totally outside of PyCSW. And then once it's deemed that it's ready to be published and it meets the quality assessment criteria, then it's fed through PyCSW. So I would see that as sort of an upstream workflow. I'm not sure if Angelos has any comments.
29:06
I just want to add that it's possible to do validation when you do harvesting, and it's possible to validate metadata given if you have an xld to test against. So libxml is
29:24
part of what we are using, and lxml, the Python library, is there. So it's possible for somebody to actually implement that. Yeah. Is that maybe also a role for GWL check here?
29:41
Sure. That will be in a talk tomorrow. Well, we're just in time here, and our next speaker is waiting backstage. And thanks again, Tom, Angelos,
30:03
and also for your work on PyCSW and many of the other GeoPython projects. And we'll see you along. Thank you very much.