We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Geospatial and Apache Arrow: accelerating geospatial data exchange and compute

Formal Metadata

Title
Geospatial and Apache Arrow: accelerating geospatial data exchange and compute
Title of Series
Number of Parts
351
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production Year2022

Content Metadata

Subject Area
Genre
Abstract
The Apache Arrow (arrow.apache.org/) project specifies a standardized language-independent columnar memory format. It enables shared computational libraries, zero-copy shared memory, streaming messaging and interprocess communication without serialization overhead, etc. Nowadays, Apache Arrow is supported by many programming languages. Geospatial data often comes in tabular format, with one (or multiple) column with feature geometries and additional columns with feature attributes. This is a perfect match for Apache Arrow. Defining a standard and efficient way to store geospatial data in the Arrow memory layout (github.com/geopandas/geo-arrow-spec/) can help interoperability between different tools and enables us to tap into the full Apache Arrow ecosystem: - Efficient, columnar data formats. Apache Arrow contains an implementation of the Apache Parquet file format, and thus gives us access to GeoParquet (github.com/opengeospatial/geoparquet) and functionalities to interact with this format in partitioned and/or cloud datasets. - The Apache Arrow project includes several mechanisms for fast data exchange (the IPC message format and Arrow Flight for transferring data between processes and machines; the C Data Interface for zero-copy sharing of data between independent runtimes running in the same process). Those mechanisms can make it easier to efficiently share data between GIS tools such as GDAL and QGIS and bindings in Python, R, Rust, with web-based applications, etc. - Several projects in the Apache Arrow community are working on high-performance query engines for computing on in-memory and bigger-than-memory data. Being able to store geospatial data in Arrow will make it possible to extend those engines with spatial queries.
Keywords