Open Data Analytics API in GeoNetwork

Cite

FOSS4G

Open Source Geospatial Foundation (OSGeo)

Florent, Gravin

Formal Metadata

Title

Open Data Analytics API in GeoNetwork

Title of Series

FOSS4G Prizren Kosovo 2023

Number of Parts

266

Author

Florent, Gravin

License

CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/66496 (DOI)

Publisher

FOSS4G

Open Source Geospatial Foundation (OSGeo)

Release Date

2023

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

In the OGC world, you have a catalog to look for metadata/datasets, and the OGC API Features to fetch the data, paginate, filter and so on. The use cases have evolved since then and data consumers expect more complete abilities from their data catalogs. Nowadays we want to analyze, understand and reuse our datasets and providing such tools is a great way to encourage data owners to share and open their warehouse. A data API could then offer: - Full text search on data points - Data fetching, paging, sorting and filtering - Data analytics, aggregation, computation - Data joining - And those operations should perform in an optimized and scalable manner. - It's what GeoNetwork has offered for decades now, and GeoNetwork is taking the move to opendata to address all those use cases. You might have heard about columnar formats, and columnar vector formats such as Arrow, Parquet… After an introduction of the context and the expectation of a well shaped data API, we’ll present different approaches and types of flow architectures - Warehouse formats - Static files (parquet) - Index - Databases (PostGIS, Cytus) - Api models and implementation - OGC API Features limitation - Duck DB - Pure SQL And compare the different stack in terms of efficiency depending on various use cases.