Development of a new framework for Distributed Processing of Big Geospatial Data

Video in TIB AV-Portal: Development of a new framework for Distributed Processing of Big Geospatial Data

Formal Metadata

Development of a new framework for Distributed Processing of Big Geospatial Data
Title of Series
Part Number
Number of Parts
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
The Geospatial world is still facing the lack of well-established distributed processing solutions tailored to the amount and heterogeneity of geodata, especially when fast data processing is a must. However, most current distributed computing frameworks have important limitations regarding both data distribution and data partitioning methods. Hence, this paper presents a prototype for tiling, stitching and processing of big geospatial data. The system is based on the IQLib concept developed in the frame of the IQmulus EU FP7 research and development project. The data distribution framework has no limitations on programming language environment and can execute scripts (and workflows) written in different development frameworks (e.g. Python, R or C#). It is capable of processing raster, vector and point cloud data. Our intention is to provide a solution to perform a wide range of geospatial processing capabilities in a distributed environment with no restrictions on data storage concepts. Our research covers methods controlling data partitioning, distributed processing and data assimilation as well. Partitioning (also referred to as “Tiling”) is a very delicate yet crucial step having impact on the whole processing. After algorithms have processed these “chunks” or “tiles” of data, partial results are collected to carry out data assimilation or “Stitching”. The paper presents the above-mentioned prototype through a case study dealing with country-wide processing of raster imagery. Assessment is carried out by comparing the results (computing time, accuracy, etc.) to concurrent solutions. Further investigations on algorithmic and implementation details are in focus for the near future.
Thermodynamischer Prozess Software framework 3 (number) Moving average Total S.A.
Module (mathematics) Thermodynamischer Prozess Pairwise comparison Slide rule Information management Thermodynamischer Prozess Electronic data interchange Projective plane Content (media) Content (media) Geometry Personal digital assistant Different (Kate Ryan album) Universe (mathematics) Software framework Software framework Modul <Datentyp> Data structure Endliche Modelltheorie Pairwise comparison
Thermodynamischer Prozess Scripting language Distribution (mathematics) Execution unit Geometry Order (biology) Set (mathematics) Software framework Vertex (graph theory) Information System identification Partition (number theory) Physical system Point cloud Scripting language Thermodynamischer Prozess Programming language Service (economics) Decision theory Point (geometry) Computer Thermodynamisches System Raster graphics Order (biology) Helmholtz decomposition Software framework Modul <Datentyp> Representation (politics) Prototype Volume Physical system Programming language Algorithm Mathematical analysis Focus (optics) Prototype Thermodynamisches System Representation (politics) Integrated development environment Implementation Summierbarkeit Software development kit Distribution (mathematics) Axiom of choice Online help Projective plane Limit (category theory) Computer programming Sign (mathematics) Computing platform Computational number theory
Presentation of a group Thermodynamischer Prozess File format Mathematical analysis Image registration Field (computer science) Variance Number Geometry Volume Latent heat Causality Hypermedia Vector space Personal digital assistant Set (mathematics) Representation (politics) Spacetime Software testing Information Summierbarkeit Computing platform Physical system Task (computing) Point cloud Area Computer font Axiom of choice Information Channel capacity Decision theory File format Point (geometry) Projective plane Attribute grammar Group action Sign (mathematics) Raster graphics Estimation Software framework Computing platform Website Point cloud Representation (politics) Volume Marginal distribution Identity management Resultant
Point (geometry) MUD Thermodynamischer Prozess Distribution (mathematics) State of matter Artificial neural network Maxima and minima 3 (number) Bit rate Vector potential Computer Scalability Value-added network Data management Geometry Type theory Personal digital assistant Representation (politics) Software framework Control theory Pairwise comparison Metropolitan area network Computing platform Domain name Metropolitan area network Thermodynamischer Prozess Pairwise comparison Execution unit Distribution (mathematics) Observational study Theory of relativity Information Copyright infringement File format Data storage device Scalability Vector potential Data management Raster graphics Uniform resource name Function (mathematics) Statement (computer science) Software framework Computing platform output Key (cryptography) Table (information) Data type
Server (computing) Thermodynamischer Prozess View (database) Algorithm Distribution (mathematics) Point (geometry) Computer file Projective plane Data storage device Computer programming Variance Mechanism design Data model Geometry Personal digital assistant Software framework Moving average Control theory Table (information) Physical system Electric current
Polygon Functional (mathematics) Thermodynamischer Prozess Distribution (mathematics) Algorithm Data storage device Food energy Scalability Airfoil Area Predictability Mechanism design Personal digital assistant Core dump Set (mathematics) Software framework Vertex (graph theory) Computing platform Partition (number theory) Scripting language Domain name Electronic data processing Thermodynamischer Prozess Raw image format Distribution (mathematics) Electronic data interchange View (database) Point (geometry) Computer file Archaeological field survey Projective plane Code Core dump Attribute grammar Limit (category theory) Computer programming Scalability Thermodynamic equilibrium Mechanism design Database normalization Personal digital assistant Function (mathematics) Mixture model Software framework Control theory Physical system Resultant
Slide rule Thermodynamischer Prozess Module (mathematics) Distribution (mathematics) Algorithm Network operating system Execution unit Tesselation Library catalog Metadata Operator (mathematics) Software framework Information Vertex (graph theory) Process (computing) Endliche Modelltheorie Physical system Newton's law of universal gravitation Module (mathematics) Scripting language Thermodynamischer Prozess Programming language Raw image format Service (economics) Distribution (mathematics) Mapping Archaeological field survey Content (media) Software framework Modul <Datentyp> Resultant
Module (mathematics) Mapping Thermodynamischer Prozess Observational study Open source Distribution (mathematics) Algorithm Mountain pass Tesselation Library catalog Planning Bit rate Phase transition Process (computing) Vertex (graph theory) Endliche Modelltheorie Module (mathematics) Source code Service (economics) Software bug Distribution (mathematics) Texture mapping Archaeological field survey Open source Content (media) Planning Open set Phase transition Interface (computing) Software framework Modul <Datentyp> Figurate number Linear subspace
Service (economics) Distribution (mathematics) Thermodynamischer Prozess Module (mathematics) Scripting language Information Distribution (mathematics) Algorithm Software developer Process modeling 3 (number) File Transfer Protocol Variance Partition (number theory) Software framework Control theory Vertex (graph theory) Modul <Datentyp> Figurate number Endliche Modelltheorie Communications protocol Communications protocol Partition (number theory) Computer architecture
Data model Latent heat Presentation of a group Module (mathematics) Thermodynamischer Prozess Distribution (mathematics) Phase transition Software framework Computer Library catalog
Thermodynamischer Prozess Algorithm 3 (number) Benchmark Vector space Integrated development environment Software testing Information Endliche Modelltheorie Implementation Arc (geometry) Point cloud Source code Decision theory Point (geometry) Open source Water vapor Open set Raster graphics Software framework Software testing Modul <Datentyp> Quicksort Information security Wide area network
Thermodynamischer Prozess Thermodynamischer Prozess Algorithm Forcing (mathematics) Multiplication sign Point (geometry) Benchmark Raster graphics Vector space Software framework Software testing Modul <Datentyp> Implementation Point cloud
Metropolitan area network Algorithm Context awareness View (database) Multiplication sign Projective plane 3 (number) Ext functor Computer Sequence Arm Portable communications device Hand fan Linker (computing) Vector space Order (biology) Formal grammar Configuration space Software framework
Thermodynamischer Prozess Algorithm Distribution (mathematics) Multiplication sign Special unitary group Machine code Area Variance Regular graph Set (mathematics) Vertex (graph theory) Endliche Modelltheorie Library (computing) Thermodynamischer Prozess Archaeological field survey Point (geometry) Computer file Interior (topology) Core dump Attribute grammar Cartesian coordinate system Scalability Single-precision floating-point format CAN bus Function (mathematics) Mixture model Software framework Self-organization
the next session by angular
have major OK so my name is on journalists and to reduce the n-gram and unfortunately the couldn't come tool to
answer your questions but only later on you can contact us so all of the research is about the development of a new framework for distributed processing of the geospatial data and this is a joint research of the institutions Institute of Geodesy Cartography and Remote Sensing in short name is for me as you can see here and then the university at at the university located in with the best Hungary the content of distal case presented in
this slide so I'm going to it and give you a short introduction about our research topic and then I'm going to introduce a nite on a project called Icarus and which is related to work but I'm not going to the to give you a detailed the introduction on this project and behind going to continue trying to define what is geospatial because it elected the differences to Big Data which is not geospatial and that and the difference is still the geospatial data which is not too big so that and then I would like to show you some N a comparison of existing solutions that he have tried it to and compare and so stressful for invokes how they are doing that now the distributed processing and this and going to present also what kind of user requirements and that we have we have selected to compare those solutions and then I'm going to present that actually and which is the main development of our in our research and it's and it's a modular and structure it has a model structure so going to present the modules and there are special development status so those models and their hand going to conclude and the last slides and some thoughts about the future looks so introduction our goal is to find a solution for processing that they
be it be geospatial data in a distributed ecosystem without any limitations on programming language as well as data partitioning and data distribution among the notes and then in order to run existing and GIS processing the scripts as the 1st step we focus on the last edit that representation for example the composing the those that the datasets and then distributed them and processing and before building this prototype system we have analyzed the data the composition but there's how can the US and data set it can be the composed and then then processed only on the different holds and then and defined the common GAS user requirements on the processing environments for the geospatial data so we have some user requirements that David to use and now and they've would think that is important for our framework or a tool kit that is an a supporting distributed post processing and also identifying the geospatial Big Data and some thoughts about the like this project is
this issue research is that is related to the right units and which is about to high-volume fusion and analyzes platform for
geospatial point clouds colleges and volumetric datasets for this system the main the goal of the project so he's a platform and this is going to have to finish this project is going to be finished in this November so long and to as a result are going to have these analyzes path from so it as the have and try to define the heavy thing on this platform is can be it available as area in the of the course or assumed and 19 which is formed by in 11 European partner institutions and via from Hungary and if you want to have more information on iTunes projects please visit to the site to actually to you so define injured geospatial data it is not an easy
task and it is the value of unknown definition is so value can start to waxy today the capability of the capacity of the calorimeter and and computing the background of your your available system on your available at definite and then there are diverse in the literature that you can find that today is a quite be and number of because that's the cost value by society is so easy to define what is the and the margin between geospatial media baron geospatial data so of some of them had also and admitting that is causal use specific so it is also and has to be defined but is beautiful by user and let useful and so we have I have tried to compare the data which is not just spatial registration of the data and used should be tests showed she from the 3 kind of food and data representation in the format and for metal restaurant and the field presentation and then we also compared to those 2 non geospatial all text-based data format and in this paper very
maybe is going to be published on 2 now I don't know that we have recorded 50 dollars it in aspects for these 3 the main territorial domain and definition hinder representation it and also in the storage and processing background of the of the of the requirements relation and then this statement is continuous and the value of K. included today their existing solutions so long on those formats for each of these 3 any In some definitions and then some requirements that tool
would be useful to have from an existing solution and so they have collected the most depressed framework supporting distributed computing and engine data and the for example we have we have selected the following aspects which is it was sold and the state and included in table and made a comparison between them so we would do a hot man admitted that the input and output data types and of course important but I've that kind of data and data supporting if already existing GIS processing in our executable um executable aside pirates supported the land or not this is the main point 1 of the main points and then what kind of data management and they are supporting the supervision of the data distribution and especially for the rest of the data types so we would like to full have full control but data chances are going to which node and then come in to getting back to the the process data and other aspects like scalable scalability potential and supported the platform and so on so we have collected all of those informations
and tried to tools to compared to the existing solutions and this table is already have and being published in the paper and then and then later on size and going to present a very can you find so after all
and and also from an experience from Document Server project the and
the Fullerton be admitted that know most of the cases and the days the full control over over over all data partitioning and data distribution mechanism is not supported so and also it and it's not really possible tool in redundant has already existing in execute a resource or scripts in in In a platform later in ecosystem so we decided to develop our own distributed processing framework and then this has been is initialized by 3 project partners has money is energy and France and the 2nd is that it's the united multi general what and therefore in from Hungary and the name is actually the handed this I Career is going to be it the a framework that is has supporting and and and the data the composition as simple as core functionality this styling and then data distribution and data and a distributed data processing in the 2nd that's the 2nd domain and functionality and then equilibrium so providing the function functionality to stage and also results so and this can cause so that they can the become can overcome the scalability limitations of the processing of so had a high-level concept is had
fallen by displayed this is already a bit live
in could be updated because there are no there is a new module that I'm going to introduce in the next slides so the main and thing is there is there a telling us teaching them back to the data distribution by that he would like to it can apply and then they're already existing and processing remote-sensing OGI-TS scripts can be in around on those datasets as Sara mentioned as a him researcher from the and national mapping agency this would be very useful because we have already there will always be operational GIS processing in a different system always in different language so of this would be very useful if we can have a framework that this and that this can be it somehow probe used for processing in a distributed in a very so all of those and you for the right units so there is the
data content of model which is responsible for storing the metadata and also not only data about the data but also data about the processing so we would like to have also on the on the data chance and then it was sold so on and the results and the dual side were into will be stored in the data content model so there is a dining as teaching model which is responsible tool and to the feature of our the metadata of the tide and also the teacher data is going to be also in the data content of question mentioned before so there is this new model would that the data is responsible for the data distribution and then there is also the the processing
module that is responsive were to run no so it's creeps under the sea already distributed it can it's so the status of those models to the data collected over his uh enormous studies on rating for the final approval from
our uh and botanists and then is going to be available in the homes in an open source may and in the 1st figure you can see and then finally gesture of the data content of model and the used it and subspace and then there's the and standing as teaching model is already defined it handles all of a and B have the 2nd figure you can see there adding texture high-level concept of how is going to work but this is still and the planning phase and then and the data distribution model this is
a new 1 and of the discordantly in supporting as the the protocol only laughter and the data partitioning and the data distribution
obelisk would be excessive extended by a 3rd party developers so if you would like to to what they some ideas in you don't hesitate to do it and then have a distributed processing model is also had under development and you can see the figure on the fear that to and the architecture of this model it's looks like this and all of these information had can be funding and
it had this is an iTunes specification and but this is going to
be an present dedicated Akimiski
Tabus sold as as soon as possible I but the specification and difficult dataset already there OK so all the related papers we had some presentations
for the hand it 1st I experience about this topic but we need to
handle before sort of of work want to do and the future work so we would like to finish
those implementations soulful of this model testing actually for the
following aspects for an existing that what it so force and then and experiment execution the big geospatial data and then benchmarking mainly on the processing time so thank you for your attention and i also would like to thank to like and turn toward
dynamically still electing has
to be here which I think you
questions and thank you very much for the for the preservation and it is very interesting there is 1 thing that was not completely clear for me so you the only to itself uh each runs over distributed datasets but he's the algorithm distributed as well I mean the palace is if you have a sequential algorithm naira something the algorithm will still be non among this you would do if it's possible SME and you will be if it's possible by the aggregate which is a rat is written as it is possible to be an idea in mind so is this possible of all the use of portable view of the as we would like to support those and because we know this is not possible at around the configuration hi when you distribute the computation they sold for all the balancing of but that is in bold inside this also so locked in this I don't know how so maybe can context our fan Michael L. 2 ways to develop their so he can I ask a question what questions am I right with holding is filed during according to a system formal if the user wants yes infer if if it's not needed because we have enough power to make it clear outside of a it's it's not needed time to over vector data piled there cannot be editable often I think it's possible but I don't know because we are focusing more massive datasets and you now so there are the project partners looking more orders of thank you what kind of algorithms do you think can be run in this framework I think we can have uh like I have shown something in this yeah like I'd like to Java and
Matlab yes I hope so but if you could have something I would like to when I was
small thinking about the process itself I mean if the the
buddhist piled that he has to be local somehow become the global organism for example as depends on influence it is something that I communicated to the door and the 1st question possibly have and maybe maybe I can answer this question the use of my books we have the same problem that we solve that similarly and having a MapReduce model and L 1 application that we use it for is finding missing streets in OpenStreetMap which is a local problem that is defined to 1 child of and we use telemetry data to figure out 0 a lot of people drive on the street but it doesn't exist on the time time more questions OK thank you


  385 ms - page object


AV-Portal 3.19.2 (70adb5fbc8bbcafb435210ef7d62ffee973cf172)