Development of a new framework for Distributed Processing of Big Geospatial Data

Open Source Geospatial Foundation (OSGeo)

Olasz, Angéla (Department of Geoinformation, Institute of Geodesy, Cartography and Remote Sensing (FÖMI),) Kristof, Daniel (FOMI - Institute of Geodesy, Cartography and Remote Sensing)

Formal Metadata

Title

Title of Series

FOSS4G Bonn 2016

Part Number

Number of Parts

193

Author

Olasz, Angéla (Department of Geoinformation, Institute of Geodesy, Cartography and Remote Sensing (FÖMI),)

Kristof, Daniel (FOMI - Institute of Geodesy, Cartography and Remote Sensing)

License

CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/20404 (DOI)

Publisher

FOSS4G

Open Source Geospatial Foundation (OSGeo)

Release Date

2016

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

The Geospatial world is still facing the lack of well-established distributed processing solutions tailored to the amount and heterogeneity of geodata, especially when fast data processing is a must. However, most current distributed computing frameworks have important limitations regarding both data distribution and data partitioning methods. Hence, this paper presents a prototype for tiling, stitching and processing of big geospatial data. The system is based on the IQLib concept developed in the frame of the IQmulus EU FP7 research and development project. The data distribution framework has no limitations on programming language environment and can execute scripts (and workflows) written in different development frameworks (e.g. Python, R or C#). It is capable of processing raster, vector and point cloud data. Our intention is to provide a solution to perform a wide range of geospatial processing capabilities in a distributed environment with no restrictions on data storage concepts. Our research covers methods controlling data partitioning, distributed processing and data assimilation as well. Partitioning (also referred to as “Tiling”) is a very delicate yet crucial step having impact on the whole processing. After algorithms have processed these “chunks” or “tiles” of data, partial results are collected to carry out data assimilation or “Stitching”. The paper presents the above-mentioned prototype through a case study dealing with country-wide processing of raster imagery. Assessment is carried out by comparing the results (computing time, accuracy, etc.) to concurrent solutions. Further investigations on algorithmic and implementation details are in focus for the near future.