XDGGS: A community-developed Xarray package to support planetary DGGS data cube computations

Cite

Related Material

FOSS4G

Kmoch, Alexander

Formal Metadata

Title

XDGGS: A community-developed Xarray package to support planetary DGGS data cube computations

Title of Series

FOSS4G Europe 2024 Tartu

Number of Parts

156

Author

Kmoch, Alexander

Contributors

N. N. (Moderation)

License

CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/68575 (DOI)

Publisher

FOSS4G

Release Date

2024

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

## 1. Introduction Traditional maps use projections to represent geospatial data in a 2-dimensional plane. This is both very convenient and computationally efficient. However, this also introduces distortions in terms of area and angles, especially for global data sets (de Sousa et al., 2019). Several global grid system approaches like Equi7Grid or UTM aim to reduce the distortions by dividing the surface of the earth into many zones and using an optimized projection for each zone to minimize distortions. However, this introduces analysis discontinuities at the zone boundaries and makes it difficult to combine data sets of varying overlapping extents (Bauer-Marschallinger et al., 2014). Discrete Global Grid Systems (DGGS) provide a new approach by introducing a hierarchy of global grids that tesselate the Earth's surface evenly into equal-area grid cells around the globe at different spatial resolutions, and providing a unique indexing system (Sahr et al., 2004). DGGS are now defined in the joint ISO and OGC DGGS Abstract Specification Topic 21 (ISO 19170-1:2021). DGGS serve as spatial reference systems facilitating data cube construction, enabling integration and aggregation of multi-resolution data sources. Various tessellation schemes such as hexagons and triangles cater to different needs - equal area, optimal neighborhoods, congruent parent-child relationships, ease of use, or vector field representation in modeling flows. Purss et al. (2019) have explained the idea to combine DGGS and data cubes and underlined the compatibility of these two concepts. Thus, DGGS are a promising way to harmonize, store, and analyse spatial data on a planetary scale. DGGSs are commonly used with tabular data, where the cell id is a column. Many datasets have other dimensions, such as time, vertical level, ensemble member, etc. For these, it was envisioned to be able to use Xarray (Hoyer and Hamman 2017), one of the core packages in the Pangeo ecosystem, as a container for DGGS data. At the joint OSGeo and Pangeo code sprint at the ESA BiDS'23 conference (6.-9. November, 2023, Vienna), members from both communities came together and envisioned implementing support for DGGS in the popular Xarray Python package, which is at the core of many geospatial big data processing workflows. The result of the codesprint is a prototype Xarray extension, named xdggs (https://github.com/xarray-contrib/xdggs), which we describe in this article. ## 2. Design and methodology There are several open-source libraries that make it possible to work with DGGS. Uber H3 , HEALPIX , rHEALPix , DGGRID , Google S2 , OpenEAGGR - many if not most have Python bindings (Kmoch et al. 2022). However, they often come with their very own not easy-to-use APIs, different assumptions, and functionalities. This makes it difficult for users to explore the wider possibilities that DGGS can offer. The aim of xdggs is to provide a unified, high-level, and user-friendly API that simplifies working with various DGGS types and their respective backend libraries, seamlessly integrating with Xarray and the Pangeo open-source geospatial computing ecosystem. Executable notebooks demonstrating the use of the xdggs package are also developed to showcase its capabilities. The xdggs community contributors set out with a set of guidelines and common DGGS features that xdggs should provide or facilitate, to make DGGS semantics and operations possible to use via the user-friendly Xarray API of working with labelled arrays. ## 3. Results This development represents a significant step forward. With xdggs, DGGS become more accessible and actionable for data users. Like traditional cartographic projections, a user does not need to be a expert on the peculiarities of various grids and libraries to work with DGGS, and can continue working in the well-known Xarray workflow. One of the aims of xdggs is making DGGS data access and conversion user-friendly, while dealing with the coordinates, tesselations, and projections under the hood. DGGS-indexed data can be stored in an appropriate format like Zarr or (Geo)Parquet, with according metadata to understand which DGGS (and potentially under which specific configuration) is needed to address the grid cell indices correctly. An interactive tutorial on Pangeo-Forge as open-access resource is being developed as well to demonstrate to users how to effectively utilizing these storage formats, thereby facilitating knowledge transfer in data storage best practices within the geospatial open-source community. Nevertheless, continuous efforts are necessary to broaden the accessibility of DGGS for scientific and operational applications, especially in handling gridded data such as global climate and ocean modeling, satellite imagery, raster data, and maps. This would require, for example, an agreement ideally with entities such as the OGC for DGGS reference systems’ registry (similar to the epsg/crs/proj database).

Keywords

foss4ge2024

AcademicTrack