Development of a new framework for Distributed Processing of Big Geospatial Data
Video in TIB AVPortal:
Development of a new framework for Distributed Processing of Big Geospatial Data
Formal Metadata
Title 
Development of a new framework for Distributed Processing of Big Geospatial Data

Title of Series  
Part Number 
88

Number of Parts 
193

Author 

License 
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. 
Identifiers 

Publisher 

Release Date 
2016

Language 
English

Content Metadata
Subject Area  
Abstract 
The Geospatial world is still facing the lack of wellestablished distributed processing solutions tailored to the amount and heterogeneity of geodata, especially when fast data processing is a must. However, most current distributed computing frameworks have important limitations regarding both data distribution and data partitioning methods. Hence, this paper presents a prototype for tiling, stitching and processing of big geospatial data. The system is based on the IQLib concept developed in the frame of the IQmulus EU FP7 research and development project. The data distribution framework has no limitations on programming language environment and can execute scripts (and workflows) written in different development frameworks (e.g. Python, R or C#). It is capable of processing raster, vector and point cloud data. Our intention is to provide a solution to perform a wide range of geospatial processing capabilities in a distributed environment with no restrictions on data storage concepts. Our research covers methods controlling data partitioning, distributed processing and data assimilation as well. Partitioning (also referred to as “Tiling”) is a very delicate yet crucial step having impact on the whole processing. After algorithms have processed these “chunks” or “tiles” of data, partial results are collected to carry out data assimilation or “Stitching”. The paper presents the abovementioned prototype through a case study dealing with countrywide processing of raster imagery. Assessment is carried out by comparing the results (computing time, accuracy, etc.) to concurrent solutions. Further investigations on algorithmic and implementation details are in focus for the near future.

00:00
Process (computing)
Meeting/Interview
Software framework
3 (number)
Moving average
Total S.A.
00:26
Module (mathematics)
Thermodynamischer Prozess
Pairwise comparison
Slide rule
Information management
Thermodynamischer Prozess
Electronic data interchange
Process (computing)
Projective plane
Content (media)
Content (media)
Geometry
Computer animation
Personal digital assistant
Different (Kate Ryan album)
Universe (mathematics)
Software framework
Software framework
Modul <Datentyp>
Endliche Modelltheorie
Data structure
Pairwise comparison
02:57
Thermodynamischer Prozess
Scripting language
Distribution (mathematics)
Execution unit
Geometry
Order (biology)
Partition of a set
Set (mathematics)
Software framework
Vertex (graph theory)
Information
System identification
Physical system
Point cloud
Scripting language
Programming language
Thermodynamischer Prozess
Service (economics)
Decision theory
Point (geometry)
Computer
Thermodynamisches System
Raster graphics
Order (biology)
Helmholtz decomposition
Software framework
Modul <Datentyp>
Representation (politics)
Prototype
Volume
Physical system
Programming language
Algorithm
Process (computing)
Mathematical analysis
Focus (optics)
Computational physics
Prototype
Thermodynamisches System
Representation (politics)
Integrated development environment
Implementation
Summierbarkeit
Software development kit
Distribution (mathematics)
Axiom of choice
Online help
Projective plane
Limit (category theory)
Computer programming
Sign (mathematics)
Computer animation
Computing platform
04:28
Presentation of a group
File format
Image registration
Variance
Geometry
Volume
Hypermedia
Vector space
Personal digital assistant
Set (mathematics)
Information
Physical system
Point cloud
Area
Computer font
Channel capacity
Decision theory
File format
Point (geometry)
Attribute grammar
Raster graphics
Software framework
Website
Representation (politics)
Volume
Resultant
Process (computing)
Mathematical analysis
Field (computer science)
Number
Latent heat
Causality
Representation (politics)
Spacetime
Software testing
Summierbarkeit
Computing platform
Task (computing)
Axiom of choice
Information
Projective plane
Group action
Sign (mathematics)
Computer animation
Estimation
Computing platform
Point cloud
Marginal distribution
Identity management
07:07
Thermodynamischer Prozess
Distribution (mathematics)
State of matter
Bit rate
Data management
Geometry
Type theory
Personal digital assistant
Software framework
Pairwise comparison
Metropolitan area network
Thermodynamischer Prozess
Observational study
Theory of relativity
Process (computing)
File format
Data storage device
Data management
Raster graphics
Uniform resource name
Software framework
output
Problemorientierte Programmiersprache
Data type
Point (geometry)
Game controller
MUD
Process (computing)
Artificial neural network
Maxima and minima
3 (number)
Vector potential
Computer
Scalability
Valueadded network
Representation (politics)
Metropolitan area network
Computing platform
Pairwise comparison
Execution unit
Distribution (mathematics)
Information
Copyright infringement
Scalability
Vector potential
Computer animation
Function (mathematics)
Statement (computer science)
Computing platform
Key (cryptography)
Table (information)
09:20
Server (computing)
Thermodynamischer Prozess
View (database)
Algorithm
Distribution (mathematics)
Process (computing)
Point (geometry)
Computer file
Projective plane
Data storage device
Control flow
Computer programming
Variance
Mechanism design
Data model
Geometry
Computer animation
Personal digital assistant
Software framework
Moving average
Table (information)
Physical system
Electric current
09:44
Polygon
Functional programming
Thermodynamischer Prozess
Distribution (mathematics)
Food energy
Airfoil
Area
Predictability
Mechanism design
Partition of a set
Personal digital assistant
Core dump
Set (mathematics)
Software framework
Vertex (graph theory)
Scripting language
Thermodynamischer Prozess
View (database)
Point (geometry)
Computer file
Attribute grammar
Control flow
Thermodynamic equilibrium
Mechanism design
Mixture model
Software framework
Problemorientierte Programmiersprache
Physical system
Resultant
Process (computing)
Algorithm
Data storage device
Scalability
Computing platform
Electronic data processing
Raw image format
Distribution (mathematics)
Electronic data interchange
Archaeological field survey
Projective plane
Code
Core dump
Limit (category theory)
Computer programming
Scalability
Database normalization
Computer animation
Personal digital assistant
Function (mathematics)
11:40
Slide rule
Thermodynamischer Prozess
Module (mathematics)
Process (computing)
Distribution (mathematics)
Algorithm
Network operating system
Execution unit
Tesselation
Library catalog
Metadata
Operator (mathematics)
Software framework
Information
Vertex (graph theory)
Endliche Modelltheorie
Physical system
Newton's law of universal gravitation
Module (mathematics)
Scripting language
Thermodynamischer Prozess
Programming language
Raw image format
Service (economics)
Distribution (mathematics)
Mapping
Archaeological field survey
Content (media)
Computer animation
Software framework
Modul <Datentyp>
Resultant
13:49
Mapping
Module (mathematics)
Thermodynamischer Prozess
Observational study
Open source
Process (computing)
Distribution (mathematics)
Algorithm
Mountain pass
Tesselation
Library catalog
Planning
Bit rate
Phase transition
Vertex (graph theory)
Endliche Modelltheorie
Module (mathematics)
Source code
Service (economics)
Software bug
Distribution (mathematics)
Texture mapping
Archaeological field survey
Open source
Content (media)
Planning
Open set
Computer animation
Phase transition
Interface (computing)
Software framework
Modul <Datentyp>
Figurate number
Linear subspace
14:54
Thermodynamischer Prozess
Module (mathematics)
Scripting language
Process (computing)
Distribution (mathematics)
Algorithm
Software developer
Process modeling
Partition of a set
Variance
Partition of a set
Vertex (graph theory)
Endliche Modelltheorie
Communications protocol
Computer architecture
Service (economics)
Distribution (mathematics)
Information
3 (number)
Control flow
File Transfer Protocol
Computer animation
Software framework
Modul <Datentyp>
Figurate number
Communications protocol
15:39
Data model
Latent heat
Presentation of a group
Module (mathematics)
Thermodynamischer Prozess
Computer animation
Distribution (mathematics)
Process (computing)
Phase transition
Software framework
Computer
Library catalog
16:06
Thermodynamischer Prozess
Process (computing)
Algorithm
3 (number)
Benchmark
Vector space
Integrated development environment
Software testing
Information
Endliche Modelltheorie
Implementation
Arc (geometry)
Point cloud
Source code
Decision theory
Point (geometry)
Open source
Water vapor
Open set
Computer animation
Raster graphics
Software framework
Software testing
Modul <Datentyp>
Quicksort
Information security
Wide area network
16:24
Thermodynamischer Prozess
Thermodynamischer Prozess
Algorithm
Process (computing)
Multiplication sign
Forcing (mathematics)
Point (geometry)
Benchmark
Computer animation
Raster graphics
Vector space
Software framework
Software testing
Modul <Datentyp>
Implementation
Point cloud
16:49
Metropolitan area network
Context awareness
Algorithm
Multiplication sign
View (database)
Projective plane
3 (number)
Ext functor
Computer
Sequence
Arm
Portable communications device
Hand fan
Linker (computing)
Computer animation
Vector space
Meeting/Interview
Order (biology)
Formal grammar
Configuration space
Software framework
19:36
Thermodynamischer Prozess
Algorithm
Distribution (mathematics)
Process (computing)
Multiplication sign
Special unitary group
Machine code
Area
Variance
Regular graph
Set (mathematics)
Vertex (graph theory)
Endliche Modelltheorie
Library (computing)
Process (computing)
Archaeological field survey
Point (geometry)
Computer file
Interior (topology)
Core dump
Attribute grammar
Cartesian coordinate system
Scalability
Singleprecision floatingpoint format
CAN bus
Computer animation
Function (mathematics)
Mixture model
Software framework
Selforganization
20:43
Computer animation
00:08
the next session by angular
00:13
have major OK so my name is on journalists and to reduce the ngram and unfortunately the couldn't come tool to
00:27
answer your questions but only later on you can contact us so all of the research is about the development of a new framework for distributed processing of the geospatial data and this is a joint research of the institutions Institute of Geodesy Cartography and Remote Sensing in short name is for me as you can see here and then the university at at the university located in with the best Hungary the content of distal case presented in
01:04
this slide so I'm going to it and give you a short introduction about our research topic and then I'm going to introduce a nite on a project called Icarus and which is related to work but I'm not going to the to give you a detailed the introduction on this project and behind going to continue trying to define what is geospatial because it elected the differences to Big Data which is not geospatial and that and the difference is still the geospatial data which is not too big so that and then I would like to show you some N a comparison of existing solutions that he have tried it to and compare and so stressful for invokes how they are doing that now the distributed processing and this and going to present also what kind of user requirements and that we have we have selected to compare those solutions and then I'm going to present that actually and which is the main development of our in our research and it's and it's a modular and structure it has a model structure so going to present the modules and there are special development status so those models and their hand going to conclude and the last slides and some thoughts about the future looks so introduction our goal is to find a solution for processing that they
02:58
be it be geospatial data in a distributed ecosystem without any limitations on programming language as well as data partitioning and data distribution among the notes and then in order to run existing and GIS processing the scripts as the 1st step we focus on the last edit that representation for example the composing the those that the datasets and then distributed them and processing and before building this prototype system we have analyzed the data the composition but there's how can the US and data set it can be the composed and then then processed only on the different holds and then and defined the common GAS user requirements on the processing environments for the geospatial data so we have some user requirements that David to use and now and they've would think that is important for our framework or a tool kit that is an a supporting distributed post processing and also identifying the geospatial Big Data and some thoughts about the like this project is
04:20
this issue research is that is related to the right units and which is about to highvolume fusion and analyzes platform for
04:29
geospatial point clouds colleges and volumetric datasets for this system the main the goal of the project so he's a platform and this is going to have to finish this project is going to be finished in this November so long and to as a result are going to have these analyzes path from so it as the have and try to define the heavy thing on this platform is can be it available as area in the of the course or assumed and 19 which is formed by in 11 European partner institutions and via from Hungary and if you want to have more information on iTunes projects please visit to the site to actually to you so define injured geospatial data it is not an easy
05:33
task and it is the value of unknown definition is so value can start to waxy today the capability of the capacity of the calorimeter and and computing the background of your your available system on your available at definite and then there are diverse in the literature that you can find that today is a quite be and number of because that's the cost value by society is so easy to define what is the and the margin between geospatial media baron geospatial data so of some of them had also and admitting that is causal use specific so it is also and has to be defined but is beautiful by user and let useful and so we have I have tried to compare the data which is not just spatial registration of the data and used should be tests showed she from the 3 kind of food and data representation in the format and for metal restaurant and the field presentation and then we also compared to those 2 non geospatial all textbased data format and in this paper very
07:07
maybe is going to be published on 2 now I don't know that we have recorded 50 dollars it in aspects for these 3 the main territorial domain and definition hinder representation it and also in the storage and processing background of the of the of the requirements relation and then this statement is continuous and the value of K. included today their existing solutions so long on those formats for each of these 3 any In some definitions and then some requirements that tool
07:56
would be useful to have from an existing solution and so they have collected the most depressed framework supporting distributed computing and engine data and the for example we have we have selected the following aspects which is it was sold and the state and included in table and made a comparison between them so we would do a hot man admitted that the input and output data types and of course important but I've that kind of data and data supporting if already existing GIS processing in our executable um executable aside pirates supported the land or not this is the main point 1 of the main points and then what kind of data management and they are supporting the supervision of the data distribution and especially for the rest of the data types so we would like to full have full control but data chances are going to which node and then come in to getting back to the the process data and other aspects like scalable scalability potential and supported the platform and so on so we have collected all of those informations
09:21
and tried to tools to compared to the existing solutions and this table is already have and being published in the paper and then and then later on size and going to present a very can you find so after all
09:40
and and also from an experience from Document Server project the and
09:47
the Fullerton be admitted that know most of the cases and the days the full control over over over all data partitioning and data distribution mechanism is not supported so and also it and it's not really possible tool in redundant has already existing in execute a resource or scripts in in In a platform later in ecosystem so we decided to develop our own distributed processing framework and then this has been is initialized by 3 project partners has money is energy and France and the 2nd is that it's the united multi general what and therefore in from Hungary and the name is actually the handed this I Career is going to be it the a framework that is has supporting and and and the data the composition as simple as core functionality this styling and then data distribution and data and a distributed data processing in the 2nd that's the 2nd domain and functionality and then equilibrium so providing the function functionality to stage and also results so and this can cause so that they can the become can overcome the scalability limitations of the processing of so had a highlevel concept is had
11:37
fallen by displayed this is already a bit live
11:41
in could be updated because there are no there is a new module that I'm going to introduce in the next slides so the main and thing is there is there a telling us teaching them back to the data distribution by that he would like to it can apply and then they're already existing and processing remotesensing OGITS scripts can be in around on those datasets as Sara mentioned as a him researcher from the and national mapping agency this would be very useful because we have already there will always be operational GIS processing in a different system always in different language so of this would be very useful if we can have a framework that this and that this can be it somehow probe used for processing in a distributed in a very so all of those and you for the right units so there is the
12:54
data content of model which is responsible for storing the metadata and also not only data about the data but also data about the processing so we would like to have also on the on the data chance and then it was sold so on and the results and the dual side were into will be stored in the data content model so there is a dining as teaching model which is responsible tool and to the feature of our the metadata of the tide and also the teacher data is going to be also in the data content of question mentioned before so there is this new model would that the data is responsible for the data distribution and then there is also the the processing
13:51
module that is responsive were to run no so it's creeps under the sea already distributed it can it's so the status of those models to the data collected over his uh enormous studies on rating for the final approval from
14:10
our uh and botanists and then is going to be available in the homes in an open source may and in the 1st figure you can see and then finally gesture of the data content of model and the used it and subspace and then there's the and standing as teaching model is already defined it handles all of a and B have the 2nd figure you can see there adding texture highlevel concept of how is going to work but this is still and the planning phase and then and the data distribution model this is
14:54
a new 1 and of the discordantly in supporting as the the protocol only laughter and the data partitioning and the data distribution
15:05
obelisk would be excessive extended by a 3rd party developers so if you would like to to what they some ideas in you don't hesitate to do it and then have a distributed processing model is also had under development and you can see the figure on the fear that to and the architecture of this model it's looks like this and all of these information had can be funding and
15:40
it had this is an iTunes specification and but this is going to
15:47
be an present dedicated Akimiski
15:52
Tabus sold as as soon as possible I but the specification and difficult dataset already there OK so all the related papers we had some presentations
16:07
for the hand it 1st I experience about this topic but we need to
16:13
handle before sort of of work want to do and the future work so we would like to finish
16:20
those implementations soulful of this model testing actually for the
16:25
following aspects for an existing that what it so force and then and experiment execution the big geospatial data and then benchmarking mainly on the processing time so thank you for your attention and i also would like to thank to like and turn toward
16:48
dynamically still electing has
16:50
to be here which I think you
17:00
questions and thank you very much for the for the preservation and it is very interesting there is 1 thing that was not completely clear for me so you the only to itself uh each runs over distributed datasets but he's the algorithm distributed as well I mean the palace is if you have a sequential algorithm naira something the algorithm will still be non among this you would do if it's possible SME and you will be if it's possible by the aggregate which is a rat is written as it is possible to be an idea in mind so is this possible of all the use of portable view of the as we would like to support those and because we know this is not possible at around the configuration hi when you distribute the computation they sold for all the balancing of but that is in bold inside this also so locked in this I don't know how so maybe can context our fan Michael L. 2 ways to develop their so he can I ask a question what questions am I right with holding is filed during according to a system formal if the user wants yes infer if if it's not needed because we have enough power to make it clear outside of a it's it's not needed time to over vector data piled there cannot be editable often I think it's possible but I don't know because we are focusing more massive datasets and you now so there are the project partners looking more orders of thank you what kind of algorithms do you think can be run in this framework I think we can have uh like I have shown something in this yeah like I'd like to Java and
19:39
Matlab yes I hope so but if you could have something I would like to when I was
19:48
small thinking about the process itself I mean if the the
19:52
buddhist piled that he has to be local somehow become the global organism for example as depends on influence it is something that I communicated to the door and the 1st question possibly have and maybe maybe I can answer this question the use of my books we have the same problem that we solve that similarly and having a MapReduce model and L 1 application that we use it for is finding missing streets in OpenStreetMap which is a local problem that is defined to 1 child of and we use telemetry data to figure out 0 a lot of people drive on the street but it doesn't exist on the time time more questions OK thank you