In the OGC world, you have a catalog to look for metadata/datasets, and the OGC API Features to fetch the data, paginate, filter and so on. The use cases have evolved since then and data consumers expect more complete abilities from their data catalogs. Nowadays we want to analyze, understand and reuse our datasets and providing such tools is a great way to encourage data owners to share and open their warehouse. A data API could then offer: - Full text search on data points - Data fetching, paging, sorting and filtering - Data analytics, aggregation, computation - Data joining - And those operations should perform in an optimized and scalable manner. - It's what GeoNetwork has offered for decades now, and GeoNetwork is taking the move to opendata to address all those use cases. You might have heard about columnar formats, and columnar vector formats such as Arrow, Parquet… After an introduction of the context and the expectation of a well shaped data API, we’ll present different approaches and types of flow architectures - Warehouse formats - Static files (parquet) - Index - Databases (PostGIS, Cytus) - Api models and implementation - OGC API Features limitation - Duck DB - Pure SQL And compare the different stack in terms of efficiency depending on various use cases. |