Development of a new framework for Distributed Processing of Big Geospatial Data

Video in TIB AV-Portal: Development of a new framework for Distributed Processing of Big Geospatial Data

Formal Metadata

Development of a new framework for Distributed Processing of Big Geospatial Data
Title of Series
Part Number
Number of Parts
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
The Geospatial world is still facing the lack of well-established distributed processing solutions tailored to the amount and heterogeneity of geodata, especially when fast data processing is a must. However, most current distributed computing frameworks have important limitations regarding both data distribution and data partitioning methods. Hence, this paper presents a prototype for tiling, stitching and processing of big geospatial data. The system is based on the IQLib concept developed in the frame of the IQmulus EU FP7 research and development project. The data distribution framework has no limitations on programming language environment and can execute scripts (and workflows) written in different development frameworks (e.g. Python, R or C#). It is capable of processing raster, vector and point cloud data. Our intention is to provide a solution to perform a wide range of geospatial processing capabilities in a distributed environment with no restrictions on data storage concepts. Our research covers methods controlling data partitioning, distributed processing and data assimilation as well. Partitioning (also referred to as “Tiling”) is a very delicate yet crucial step having impact on the whole processing. After algorithms have processed these “chunks” or “tiles” of data, partial results are collected to carry out data assimilation or “Stitching”. The paper presents the above-mentioned prototype through a case study dealing with country-wide processing of raster imagery. Assessment is carried out by comparing the results (computing time, accuracy, etc.) to concurrent solutions. Further investigations on algorithmic and implementation details are in focus for the near future.
Process (computing) Meeting/Interview Software framework 3 (number) Moving average Total S.A.
Module (mathematics) Thermodynamischer Prozess Pairwise comparison Slide rule Information management Thermodynamischer Prozess Electronic data interchange Process (computing) Projective plane Content (media) Content (media) Geometry Computer animation Personal digital assistant Different (Kate Ryan album) Universe (mathematics) Software framework Software framework Modul <Datentyp> Endliche Modelltheorie Data structure Pairwise comparison
Thermodynamischer Prozess Scripting language Distribution (mathematics) Execution unit Geometry Order (biology) Partition of a set Set (mathematics) Software framework Vertex (graph theory) Information System identification Physical system Point cloud Scripting language Programming language Thermodynamischer Prozess Service (economics) Decision theory Point (geometry) Computer Thermodynamisches System Raster graphics Order (biology) Helmholtz decomposition Software framework Modul <Datentyp> Representation (politics) Prototype Volume Physical system Programming language Algorithm Process (computing) Mathematical analysis Focus (optics) Computational physics Prototype Thermodynamisches System Representation (politics) Integrated development environment Implementation Summierbarkeit Software development kit Distribution (mathematics) Axiom of choice Online help Projective plane Limit (category theory) Computer programming Sign (mathematics) Computer animation Computing platform
Presentation of a group File format Image registration Variance Geometry Volume Hypermedia Vector space Personal digital assistant Set (mathematics) Information Physical system Point cloud Area Computer font Channel capacity Decision theory File format Point (geometry) Attribute grammar Raster graphics Software framework Website Representation (politics) Volume Resultant Process (computing) Mathematical analysis Field (computer science) Number Latent heat Causality Representation (politics) Spacetime Software testing Summierbarkeit Computing platform Task (computing) Axiom of choice Information Projective plane Group action Sign (mathematics) Computer animation Estimation Computing platform Point cloud Marginal distribution Identity management
Thermodynamischer Prozess Distribution (mathematics) State of matter Bit rate Data management Geometry Type theory Personal digital assistant Software framework Pairwise comparison Metropolitan area network Thermodynamischer Prozess Observational study Theory of relativity Process (computing) File format Data storage device Data management Raster graphics Uniform resource name Software framework output Problemorientierte Programmiersprache Data type Point (geometry) Game controller MUD Process (computing) Artificial neural network Maxima and minima 3 (number) Vector potential Computer Scalability Value-added network Representation (politics) Metropolitan area network Computing platform Pairwise comparison Execution unit Distribution (mathematics) Information Copyright infringement Scalability Vector potential Computer animation Function (mathematics) Statement (computer science) Computing platform Key (cryptography) Table (information)
Server (computing) Thermodynamischer Prozess View (database) Algorithm Distribution (mathematics) Process (computing) Point (geometry) Computer file Projective plane Data storage device Control flow Computer programming Variance Mechanism design Data model Geometry Computer animation Personal digital assistant Software framework Moving average Table (information) Physical system Electric current
Polygon Functional programming Thermodynamischer Prozess Distribution (mathematics) Food energy Airfoil Area Predictability Mechanism design Partition of a set Personal digital assistant Core dump Set (mathematics) Software framework Vertex (graph theory) Scripting language Thermodynamischer Prozess View (database) Point (geometry) Computer file Attribute grammar Control flow Thermodynamic equilibrium Mechanism design Mixture model Software framework Problemorientierte Programmiersprache Physical system Resultant Process (computing) Algorithm Data storage device Scalability Computing platform Electronic data processing Raw image format Distribution (mathematics) Electronic data interchange Archaeological field survey Projective plane Code Core dump Limit (category theory) Computer programming Scalability Database normalization Computer animation Personal digital assistant Function (mathematics)
Slide rule Thermodynamischer Prozess Module (mathematics) Process (computing) Distribution (mathematics) Algorithm Network operating system Execution unit Tesselation Library catalog Metadata Operator (mathematics) Software framework Information Vertex (graph theory) Endliche Modelltheorie Physical system Newton's law of universal gravitation Module (mathematics) Scripting language Thermodynamischer Prozess Programming language Raw image format Service (economics) Distribution (mathematics) Mapping Archaeological field survey Content (media) Computer animation Software framework Modul <Datentyp> Resultant
Mapping Module (mathematics) Thermodynamischer Prozess Observational study Open source Process (computing) Distribution (mathematics) Algorithm Mountain pass Tesselation Library catalog Planning Bit rate Phase transition Vertex (graph theory) Endliche Modelltheorie Module (mathematics) Source code Service (economics) Software bug Distribution (mathematics) Texture mapping Archaeological field survey Open source Content (media) Planning Open set Computer animation Phase transition Interface (computing) Software framework Modul <Datentyp> Figurate number Linear subspace
Thermodynamischer Prozess Module (mathematics) Scripting language Process (computing) Distribution (mathematics) Algorithm Software developer Process modeling Partition of a set Variance Partition of a set Vertex (graph theory) Endliche Modelltheorie Communications protocol Computer architecture Service (economics) Distribution (mathematics) Information 3 (number) Control flow File Transfer Protocol Computer animation Software framework Modul <Datentyp> Figurate number Communications protocol
Data model Latent heat Presentation of a group Module (mathematics) Thermodynamischer Prozess Computer animation Distribution (mathematics) Process (computing) Phase transition Software framework Computer Library catalog
Thermodynamischer Prozess Process (computing) Algorithm 3 (number) Benchmark Vector space Integrated development environment Software testing Information Endliche Modelltheorie Implementation Arc (geometry) Point cloud Source code Decision theory Point (geometry) Open source Water vapor Open set Computer animation Raster graphics Software framework Software testing Modul <Datentyp> Quicksort Information security Wide area network
Thermodynamischer Prozess Thermodynamischer Prozess Algorithm Process (computing) Multiplication sign Forcing (mathematics) Point (geometry) Benchmark Computer animation Raster graphics Vector space Software framework Software testing Modul <Datentyp> Implementation Point cloud
Metropolitan area network Context awareness Algorithm Multiplication sign View (database) Projective plane 3 (number) Ext functor Computer Sequence Arm Portable communications device Hand fan Linker (computing) Computer animation Vector space Meeting/Interview Order (biology) Formal grammar Configuration space Software framework
Thermodynamischer Prozess Algorithm Distribution (mathematics) Process (computing) Multiplication sign Special unitary group Machine code Area Variance Regular graph Set (mathematics) Vertex (graph theory) Endliche Modelltheorie Library (computing) Process (computing) Archaeological field survey Point (geometry) Computer file Interior (topology) Core dump Attribute grammar Cartesian coordinate system Scalability Single-precision floating-point format CAN bus Computer animation Function (mathematics) Mixture model Software framework Self-organization
Computer animation
the next session by angular
have major OK so my name is on journalists and to reduce the n-gram and unfortunately the couldn't come tool to
answer your questions but only later on you can contact us so all of the research is about the development of a new framework for distributed processing of the geospatial data and this is a joint research of the institutions Institute of Geodesy Cartography and Remote Sensing in short name is for me as you can see here and then the university at at the university located in with the best Hungary the content of distal case presented in
this slide so I'm going to it and give you a short introduction about our research topic and then I'm going to introduce a nite on a project called Icarus and which is related to work but I'm not going to the to give you a detailed the introduction on this project and behind going to continue trying to define what is geospatial because it elected the differences to Big Data which is not geospatial and that and the difference is still the geospatial data which is not too big so that and then I would like to show you some N a comparison of existing solutions that he have tried it to and compare and so stressful for invokes how they are doing that now the distributed processing and this and going to present also what kind of user requirements and that we have we have selected to compare those solutions and then I'm going to present that actually and which is the main development of our in our research and it's and it's a modular and structure it has a model structure so going to present the modules and there are special development status so those models and their hand going to conclude and the last slides and some thoughts about the future looks so introduction our goal is to find a solution for processing that they
be it be geospatial data in a distributed ecosystem without any limitations on programming language as well as data partitioning and data distribution among the notes and then in order to run existing and GIS processing the scripts as the 1st step we focus on the last edit that representation for example the composing the those that the datasets and then distributed them and processing and before building this prototype system we have analyzed the data the composition but there's how can the US and data set it can be the composed and then then processed only on the different holds and then and defined the common GAS user requirements on the processing environments for the geospatial data so we have some user requirements that David to use and now and they've would think that is important for our framework or a tool kit that is an a supporting distributed post processing and also identifying the geospatial Big Data and some thoughts about the like this project is
this issue research is that is related to the right units and which is about to high-volume fusion and analyzes platform for
geospatial point clouds colleges and volumetric datasets for this system the main the goal of the project so he's a platform and this is going to have to finish this project is going to be finished in this November so long and to as a result are going to have these analyzes path from so it as the have and try to define the heavy thing on this platform is can be it available as area in the of the course or assumed and 19 which is formed by in 11 European partner institutions and via from Hungary and if you want to have more information on iTunes projects please visit to the site to actually to you so define injured geospatial data it is not an easy
task and it is the value of unknown definition is so value can start to waxy today the capability of the capacity of the calorimeter and and computing the background of your your available system on your available at definite and then there are diverse in the literature that you can find that today is a quite be and number of because that's the cost value by society is so easy to define what is the and the margin between geospatial media baron geospatial data so of some of them had also and admitting that is causal use specific so it is also and has to be defined but is beautiful by user and let useful and so we have I have tried to compare the data which is not just spatial registration of the data and used should be tests showed she from the 3 kind of food and data representation in the format and for metal restaurant and the field presentation and then we also compared to those 2 non geospatial all text-based data format and in this paper very
maybe is going to be published on 2 now I don't know that we have recorded 50 dollars it in aspects for these 3 the main territorial domain and definition hinder representation it and also in the storage and processing background of the of the of the requirements relation and then this statement is continuous and the value of K. included today their existing solutions so long on those formats for each of these 3 any In some definitions and then some requirements that tool
would be useful to have from an existing solution and so they have collected the most depressed framework supporting distributed computing and engine data and the for example we have we have selected the following aspects which is it was sold and the state and included in table and made a comparison between them so we would do a hot man admitted that the input and output data types and of course important but I've that kind of data and data supporting if already existing GIS processing in our executable um executable aside pirates supported the land or not this is the main point 1 of the main points and then what kind of data management and they are supporting the supervision of the data distribution and especially for the rest of the data types so we would like to full have full control but data chances are going to which node and then come in to getting back to the the process data and other aspects like scalable scalability potential and supported the platform and so on so we have collected all of those informations
and tried to tools to compared to the existing solutions and this table is already have and being published in the paper and then and then later on size and going to present a very can you find so after all
and and also from an experience from Document Server project the and
the Fullerton be admitted that know most of the cases and the days the full control over over over all data partitioning and data distribution mechanism is not supported so and also it and it's not really possible tool in redundant has already existing in execute a resource or scripts in in In a platform later in ecosystem so we decided to develop our own distributed processing framework and then this has been is initialized by 3 project partners has money is energy and France and the 2nd is that it's the united multi general what and therefore in from Hungary and the name is actually the handed this I Career is going to be it the a framework that is has supporting and and and the data the composition as simple as core functionality this styling and then data distribution and data and a distributed data processing in the 2nd that's the 2nd domain and functionality and then equilibrium so providing the function functionality to stage and also results so and this can cause so that they can the become can overcome the scalability limitations of the processing of so had a high-level concept is had
fallen by displayed this is already a bit live
in could be updated because there are no there is a new module that I'm going to introduce in the next slides so the main and thing is there is there a telling us teaching them back to the data distribution by that he would like to it can apply and then they're already existing and processing remote-sensing OGI-TS scripts can be in around on those datasets as Sara mentioned as a him researcher from the and national mapping agency this would be very useful because we have already there will always be operational GIS processing in a different system always in different language so of this would be very useful if we can have a framework that this and that this can be it somehow probe used for processing in a distributed in a very so all of those and you for the right units so there is the
data content of model which is responsible for storing the metadata and also not only data about the data but also data about the processing so we would like to have also on the on the data chance and then it was sold so on and the results and the dual side were into will be stored in the data content model so there is a dining as teaching model which is responsible tool and to the feature of our the metadata of the tide and also the teacher data is going to be also in the data content of question mentioned before so there is this new model would that the data is responsible for the data distribution and then there is also the the processing
module that is responsive were to run no so it's creeps under the sea already distributed it can it's so the status of those models to the data collected over his uh enormous studies on rating for the final approval from
our uh and botanists and then is going to be available in the homes in an open source may and in the 1st figure you can see and then finally gesture of the data content of model and the used it and subspace and then there's the and standing as teaching model is already defined it handles all of a and B have the 2nd figure you can see there adding texture high-level concept of how is going to work but this is still and the planning phase and then and the data distribution model this is
a new 1 and of the discordantly in supporting as the the protocol only laughter and the data partitioning and the data distribution
obelisk would be excessive extended by a 3rd party developers so if you would like to to what they some ideas in you don't hesitate to do it and then have a distributed processing model is also had under development and you can see the figure on the fear that to and the architecture of this model it's looks like this and all of these information had can be funding and
it had this is an iTunes specification and but this is going to
be an present dedicated Akimiski
Tabus sold as as soon as possible I but the specification and difficult dataset already there OK so all the related papers we had some presentations
for the hand it 1st I experience about this topic but we need to
handle before sort of of work want to do and the future work so we would like to finish
those implementations soulful of this model testing actually for the
following aspects for an existing that what it so force and then and experiment execution the big geospatial data and then benchmarking mainly on the processing time so thank you for your attention and i also would like to thank to like and turn toward
dynamically still electing has
to be here which I think you
questions and thank you very much for the for the preservation and it is very interesting there is 1 thing that was not completely clear for me so you the only to itself uh each runs over distributed datasets but he's the algorithm distributed as well I mean the palace is if you have a sequential algorithm naira something the algorithm will still be non among this you would do if it's possible SME and you will be if it's possible by the aggregate which is a rat is written as it is possible to be an idea in mind so is this possible of all the use of portable view of the as we would like to support those and because we know this is not possible at around the configuration hi when you distribute the computation they sold for all the balancing of but that is in bold inside this also so locked in this I don't know how so maybe can context our fan Michael L. 2 ways to develop their so he can I ask a question what questions am I right with holding is filed during according to a system formal if the user wants yes infer if if it's not needed because we have enough power to make it clear outside of a it's it's not needed time to over vector data piled there cannot be editable often I think it's possible but I don't know because we are focusing more massive datasets and you now so there are the project partners looking more orders of thank you what kind of algorithms do you think can be run in this framework I think we can have uh like I have shown something in this yeah like I'd like to Java and
Matlab yes I hope so but if you could have something I would like to when I was
small thinking about the process itself I mean if the the
buddhist piled that he has to be local somehow become the global organism for example as depends on influence it is something that I communicated to the door and the 1st question possibly have and maybe maybe I can answer this question the use of my books we have the same problem that we solve that similarly and having a MapReduce model and L 1 application that we use it for is finding missing streets in OpenStreetMap which is a local problem that is defined to 1 child of and we use telemetry data to figure out 0 a lot of people drive on the street but it doesn't exist on the time time more questions OK thank you