Bestand wählen

Development of a new framework for Distributed Processing of Big Geospatial Data

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Erkannte Entitäten
the next session by angular
have major OK so my name is on journalists and to reduce the n-gram and unfortunately the couldn't come tool to
answer your questions but only later on you can contact us so all of the research is about the development of a new framework for distributed processing of the geospatial data and this is a joint research of the institutions Institute of Geodesy Cartography and Remote Sensing in short name is for me as you can see here and then the university at at the university located in with the best Hungary the content of distal case presented in
this slide so I'm going to it and give you a short introduction about our research topic and then I'm going to introduce a nite on a project called Icarus and which is related to work but I'm not going to the to give you a detailed the introduction on this project and behind going to continue trying to define what is geospatial because it elected the differences to Big Data which is not geospatial and that and the difference is still the geospatial data which is not too big so that and then I would like to show you some N a comparison of existing solutions that he have tried it to and compare and so stressful for invokes how they are doing that now the distributed processing and this and going to present also what kind of user requirements and that we have we have selected to compare those solutions and then I'm going to present that actually and which is the main development of our in our research and it's and it's a modular and structure it has a model structure so going to present the modules and there are special development status so those models and their hand going to conclude and the last slides and some thoughts about the future looks so introduction our goal is to find a solution for processing that they
be it be geospatial data in a distributed ecosystem without any limitations on programming language as well as data partitioning and data distribution among the notes and then in order to run existing and GIS processing the scripts as the 1st step we focus on the last edit that representation for example the composing the those that the datasets and then distributed them and processing and before building this prototype system we have analyzed the data the composition but there's how can the US and data set it can be the composed and then then processed only on the different holds and then and defined the common GAS user requirements on the processing environments for the geospatial data so we have some user requirements that David to use and now and they've would think that is important for our framework or a tool kit that is an a supporting distributed post processing and also identifying the geospatial Big Data and some thoughts about the like this project is
this issue research is that is related to the right units and which is about to high-volume fusion and analyzes platform for
geospatial point clouds colleges and volumetric datasets for this system the main the goal of the project so he's a platform and this is going to have to finish this project is going to be finished in this November so long and to as a result are going to have these analyzes path from so it as the have and try to define the heavy thing on this platform is can be it available as area in the of the course or assumed and 19 which is formed by in 11 European partner institutions and via from Hungary and if you want to have more information on iTunes projects please visit to the site to actually to you so define injured geospatial data it is not an easy
task and it is the value of unknown definition is so value can start to waxy today the capability of the capacity of the calorimeter and and computing the background of your your available system on your available at definite and then there are diverse in the literature that you can find that today is a quite be and number of because that's the cost value by society is so easy to define what is the and the margin between geospatial media baron geospatial data so of some of them had also and admitting that is causal use specific so it is also and has to be defined but is beautiful by user and let useful and so we have I have tried to compare the data which is not just spatial registration of the data and used should be tests showed she from the 3 kind of food and data representation in the format and for metal restaurant and the field presentation and then we also compared to those 2 non geospatial all text-based data format and in this paper very
maybe is going to be published on 2 now I don't know that we have recorded 50 dollars it in aspects for these 3 the main territorial domain and definition hinder representation it and also in the storage and processing background of the of the of the requirements relation and then this statement is continuous and the value of K. included today their existing solutions so long on those formats for each of these 3 any In some definitions and then some requirements that tool
would be useful to have from an existing solution and so they have collected the most depressed framework supporting distributed computing and engine data and the for example we have we have selected the following aspects which is it was sold and the state and included in table and made a comparison between them so we would do a hot man admitted that the input and output data types and of course important but I've that kind of data and data supporting if already existing GIS processing in our executable um executable aside pirates supported the land or not this is the main point 1 of the main points and then what kind of data management and they are supporting the supervision of the data distribution and especially for the rest of the data types so we would like to full have full control but data chances are going to which node and then come in to getting back to the the process data and other aspects like scalable scalability potential and supported the platform and so on so we have collected all of those informations
and tried to tools to compared to the existing solutions and this table is already have and being published in the paper and then and then later on size and going to present a very can you find so after all
and and also from an experience from Document Server project the and
the Fullerton be admitted that know most of the cases and the days the full control over over over all data partitioning and data distribution mechanism is not supported so and also it and it's not really possible tool in redundant has already existing in execute a resource or scripts in in In a platform later in ecosystem so we decided to develop our own distributed processing framework and then this has been is initialized by 3 project partners has money is energy and France and the 2nd is that it's the united multi general what and therefore in from Hungary and the name is actually the handed this I Career is going to be it the a framework that is has supporting and and and the data the composition as simple as core functionality this styling and then data distribution and data and a distributed data processing in the 2nd that's the 2nd domain and functionality and then equilibrium so providing the function functionality to stage and also results so and this can cause so that they can the become can overcome the scalability limitations of the processing of so had a high-level concept is had
fallen by displayed this is already a bit live
in could be updated because there are no there is a new module that I'm going to introduce in the next slides so the main and thing is there is there a telling us teaching them back to the data distribution by that he would like to it can apply and then they're already existing and processing remote-sensing OGI-TS scripts can be in around on those datasets as Sara mentioned as a him researcher from the and national mapping agency this would be very useful because we have already there will always be operational GIS processing in a different system always in different language so of this would be very useful if we can have a framework that this and that this can be it somehow probe used for processing in a distributed in a very so all of those and you for the right units so there is the
data content of model which is responsible for storing the metadata and also not only data about the data but also data about the processing so we would like to have also on the on the data chance and then it was sold so on and the results and the dual side were into will be stored in the data content model so there is a dining as teaching model which is responsible tool and to the feature of our the metadata of the tide and also the teacher data is going to be also in the data content of question mentioned before so there is this new model would that the data is responsible for the data distribution and then there is also the the processing
module that is responsive were to run no so it's creeps under the sea already distributed it can it's so the status of those models to the data collected over his uh enormous studies on rating for the final approval from
our uh and botanists and then is going to be available in the homes in an open source may and in the 1st figure you can see and then finally gesture of the data content of model and the used it and subspace and then there's the and standing as teaching model is already defined it handles all of a and B have the 2nd figure you can see there adding texture high-level concept of how is going to work but this is still and the planning phase and then and the data distribution model this is
a new 1 and of the discordantly in supporting as the the protocol only laughter and the data partitioning and the data distribution
obelisk would be excessive extended by a 3rd party developers so if you would like to to what they some ideas in you don't hesitate to do it and then have a distributed processing model is also had under development and you can see the figure on the fear that to and the architecture of this model it's looks like this and all of these information had can be funding and
it had this is an iTunes specification and but this is going to
be an present dedicated Akimiski
Tabus sold as as soon as possible I but the specification and difficult dataset already there OK so all the related papers we had some presentations
for the hand it 1st I experience about this topic but we need to
handle before sort of of work want to do and the future work so we would like to finish
those implementations soulful of this model testing actually for the
following aspects for an existing that what it so force and then and experiment execution the big geospatial data and then benchmarking mainly on the processing time so thank you for your attention and i also would like to thank to like and turn toward
dynamically still electing has
to be here which I think you
questions and thank you very much for the for the preservation and it is very interesting there is 1 thing that was not completely clear for me so you the only to itself uh each runs over distributed datasets but he's the algorithm distributed as well I mean the palace is if you have a sequential algorithm naira something the algorithm will still be non among this you would do if it's possible SME and you will be if it's possible by the aggregate which is a rat is written as it is possible to be an idea in mind so is this possible of all the use of portable view of the as we would like to support those and because we know this is not possible at around the configuration hi when you distribute the computation they sold for all the balancing of but that is in bold inside this also so locked in this I don't know how so maybe can context our fan Michael L. 2 ways to develop their so he can I ask a question what questions am I right with holding is filed during according to a system formal if the user wants yes infer if if it's not needed because we have enough power to make it clear outside of a it's it's not needed time to over vector data piled there cannot be editable often I think it's possible but I don't know because we are focusing more massive datasets and you now so there are the project partners looking more orders of thank you what kind of algorithms do you think can be run in this framework I think we can have uh like I have shown something in this yeah like I'd like to Java and
Matlab yes I hope so but if you could have something I would like to when I was
small thinking about the process itself I mean if the the
buddhist piled that he has to be local somehow become the global organism for example as depends on influence it is something that I communicated to the door and the 1st question possibly have and maybe maybe I can answer this question the use of my books we have the same problem that we solve that similarly and having a MapReduce model and L 1 application that we use it for is finding missing streets in OpenStreetMap which is a local problem that is defined to 1 child of and we use telemetry data to figure out 0 a lot of people drive on the street but it doesn't exist on the time time more questions OK thank you
Zusammengesetzte Verteilung
Skript <Programm>
Physikalische Theorie
Ordnung <Mathematik>
Tabelle <Informatik>
Objekt <Kategorie>
Folge <Mathematik>
Selbst organisierendes System
Algebraisches Modell
Spezifisches Volumen
Modul <Datentyp>
Fächer <Mathematik>
Formale Grammatik
Konvexe Hülle
Tablet PC
Weg <Topologie>
Lineares Funktional
Physikalischer Effekt
Machsches Prinzip
Speicher <Informatik>
Stochastischer Prozess
Framework <Informatik>
Wissenschaftliches Rechnen
Projektive Ebene
Web Site
Physikalisches System
Inverser Limes
Speicher <Informatik>
Strom <Mathematik>
Physikalisches System
Verteiltes System
Binder <Informatik>
Räumliche Anordnung
Formale Grammatik
Stetige Abbildung
Geschlossenes System
Protokoll <Datenverarbeitungssystem>
Metropolitan area network
Befehl <Informatik>
Geschlossenes System
Distribution <Funktionalanalysis>
Kontextbezogenes System
Software Development Kit
Unabhängige Menge
Web log
Elektronischer Fingerabdruck
Relationale Datenbank
Automatische Handlungsplanung
Content <Internet>
Dienst <Informatik>
Open Source
Skript <Programm>
Inhalt <Mathematik>
Protokoll <Datenverarbeitungssystem>
Open Source
Profil <Strömung>
Thermodynamisches Gleichgewicht
Automatische Handlungsplanung
Chatten <Kommunikation>
Attributierte Grammatik
Regulärer Graph
Kartesische Koordinaten
Remote Access
Einheit <Mathematik>
Figurierte Zahl
Nichtlinearer Operator
Funktion <Mathematik>
Registrierung <Bildverarbeitung>
Stochastischer Prozess
Virtuelle Maschine
Kombinatorische Gruppentheorie
Framework <Informatik>
Mobiles Endgerät
Elektronische Publikation
Mapping <Computergraphik>


Formale Metadaten

Titel Development of a new framework for Distributed Processing of Big Geospatial Data
Serientitel FOSS4G Bonn 2016
Teil 88
Anzahl der Teile 193
Autor Olasz, Angéla (Department of Geoinformation, Institute of Geodesy, Cartography and Remote Sensing (FÖMI),)
Kristof, Daniel (FOMI - Institute of Geodesy, Cartography and Remote Sensing)
Lizenz CC-Namensnennung 3.0 Deutschland:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
DOI 10.5446/20404
Herausgeber FOSS4G, Open Source Geospatial Foundation (OSGeo)
Erscheinungsjahr 2016
Sprache Englisch

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract The Geospatial world is still facing the lack of well-established distributed processing solutions tailored to the amount and heterogeneity of geodata, especially when fast data processing is a must. However, most current distributed computing frameworks have important limitations regarding both data distribution and data partitioning methods. Hence, this paper presents a prototype for tiling, stitching and processing of big geospatial data. The system is based on the IQLib concept developed in the frame of the IQmulus EU FP7 research and development project. The data distribution framework has no limitations on programming language environment and can execute scripts (and workflows) written in different development frameworks (e.g. Python, R or C#). It is capable of processing raster, vector and point cloud data. Our intention is to provide a solution to perform a wide range of geospatial processing capabilities in a distributed environment with no restrictions on data storage concepts. Our research covers methods controlling data partitioning, distributed processing and data assimilation as well. Partitioning (also referred to as “Tiling”) is a very delicate yet crucial step having impact on the whole processing. After algorithms have processed these “chunks” or “tiles” of data, partial results are collected to carry out data assimilation or “Stitching”. The paper presents the above-mentioned prototype through a case study dealing with country-wide processing of raster imagery. Assessment is carried out by comparing the results (computing time, accuracy, etc.) to concurrent solutions. Further investigations on algorithmic and implementation details are in focus for the near future.

Zugehöriges Material

Ähnliche Filme