We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

A New GIS Toolbox For Integrating Massive Heterogeneous GIS Data For Land Use Change Analysis

00:00

Formal Metadata

Title
A New GIS Toolbox For Integrating Massive Heterogeneous GIS Data For Land Use Change Analysis
Title of Series
Number of Parts
95
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language
Production PlaceNottingham

Content Metadata

Subject Area
Genre
Abstract
Agricultural land use in Germany and related impacts on the environment and the use of natural resources are key research topics at the Thünen-Institute of Rural Studies. As spatial context is essential for the analysis of causal connections, GIS data regarding all necessary information was gathered during different research projects and prepared for processing in a database. In particular, the Integrated Administration and Control System, which was available for certain project purposes for several Federal Laender and years, serves as a very detailed data source for agricultural land use. We use different Open Source GIS software like PostgreSQL/PostGIS, GRASS and QuantumGIS for geoprocessing, supplemented with the proprietary ESRI product ArcGIS. After introducing the used input data and the general processing approach, this paper presents a selection of geoprocessing routines for which Open Source GIS software was used. As an exemplary 'use case' for the conclusions from the consecutive statistical analysis, we summarize impacts of increased biogas production on agricultural land use change highlighting the trend in biogas maize cultivation and the conversion of permanent grassland to agricultural cropland.
Mathematical analysisObservational studyAnglePoint (geometry)Mathematical analysisMassComputer virusComputer animation
outputSoftwareProcess (computing)Vector spaceCore dumpOverlay-NetzRaster graphicsAddress spaceSound effectObservational studyMathematical analysisMathematicsSystem identificationQuantificationAreaIntegrated development environmentNatural numberPoint (geometry)Conservation lawPerformance appraisalView (database)StatisticsFile formatPolygonMethodenbankModal logicOpen sourceDisintegrationBoundary value problemDenial-of-service attackWater vaporData modelDigital signalType theoryCondition numberError messageCalculationSummierbarkeitTable (information)Variable (mathematics)Complete metric spacePhysical systemUniqueness quantificationSingle-precision floating-point formatGaussian eliminationDecision theoryShape (magazine)Data structureResource allocationLevel (video gaming)Congruence subgroupScripting languageDatabaseArc (geometry)VarianceString (computer science)Lattice (order)Field (computer science)InformationMetropolitan area network3 (number)Public domainStress (mechanics)Dew pointMatching (graph theory)GeometryComputer configurationSample (statistics)Electric currentFrame problemDigital signal processingGrass (card game)Data storage deviceSensitivity analysisSubsetNetwork topologyException handlingAdditionMusical ensembleAreaField (computer science)Operator (mathematics)MereologyProcess (computing)Moment (mathematics)Overlay-NetzPoint (geometry)Endliche ModelltheorieVector spaceDifferent (Kate Ryan album)Extension (kinesiology)StatisticsResultantSet theoryFörderverein International Co-Operative StudiesGeometrySubsetOpen sourcePhysical systemView (database)SummierbarkeitBasis <Mathematik>MetreSoftwareComputer filePower (physics)INTEGRALImage resolutionString (computer science)Address spaceIntegrated development environmentSound effectNatural numberAdditionState of matterComputer configurationCondition numberSelectivity (electronic)Context awarenessProduct (business)Shape (magazine)Gaussian eliminationSlide ruleCombinational logicMathematicsLatent heatUniform resource locatorDatabaseMultiplication signNumerical analysisObject (grammar)Free variables and bound variablesPolygonUsabilityInformationPresentation of a groupExpert systemTable (information)Graph (mathematics)Grass (card game)MassContent (media)Data structureCASE <Informatik>outputFeldrechnerRelational databaseTerm (mathematics)Just-in-Time-CompilerBoundary value problemTransformation (genetics)Projective planePreprocessorData acquisitionFlow separationCoroutineUniqueness quantificationDivisorDevice driverFile formatSoftware testingVector potentialFunktionalanalysisFerry CorstenArithmetic meanSystem identificationMessage passingElectronic data processingNetwork topologyRoundness (object)Electric generatorGame controllerLatin squareSoftware developerRegulator gene1 (number)Logic gateCycle (graph theory)Control systemData modelException handlingData storage deviceMappingCalculationBeta function3 (number)Interface (computing)Scripting languageRaster graphicsDigitizingMathematical analysisQuery languageSingle-precision floating-point formatIntrusion detection systemExterior algebraPattern languageOpen setSystem administratorScaling (geometry)Derivation (linguistics)Core dumpWater vaporResource allocationMultiplication2 (number)Maxima and minimaLattice (order)Attribute grammarParameter (computer programming)Modal logicComputer animation
Transcript: English(auto-generated)
Okay, well, hello. We want to present our paper, a new GIS or GIS toolbox integrating massive heterogeneous GIS data for Lantius analysis. And first of all, I want to give
you a short overview. I will present the scientific background and our research objective and all used input data. After that, I will show the conceptual approach
of our GIS toolbox with a general processing workflow and the used software product and the requirements for vector processing. And after that, my colleague Birgit Lachner will present the results of the GIS toolbox, the vector processing,
then the core of our GIS toolbox with spatial overlay, the RASA data processing, the additional data. And in the end, she will evaluate our toolbox and show some pros and cons of used software products. Okay, and both of us are working at the Schindler Institute of
Rural Studies at Brunswick, Germany. The research area is the use of resources and environmental and natural protection. In particular, we analyze land use change in agricultural areas of Germany, focusing on environmental impacts and effects of legal
regulation. You can see three project examples. The first one is the current project and we presented GIS toolbox can be seen as a result of all these three projects.
Yeah, the research objective or our research objective was to develop a GIS toolbox enabling the processing and analysis of massive heterogeneous GIS data for statistical analysis.
And therefore, the toolbox subsumes all necessary processing steps for data preparation and for integration of all input data sets into one combined data set.
The data or there are some conceptual considerations. First, we decided to choose vector polygon approach as the main database. We defined some necessary processing steps and in the end, there are reviling requirements when it comes to software selection.
And for example, the software has to fit in our existing infrastructure. The efficiency of the software products has to be considered and interfaces has to be minimized.
And in that context, it was very important to evaluate some open source software. On this slide, I want to show all used input data. And as you can see, there are different formats of the data. Most data is in a vector format, but there's also Russia
format available or data in Excel charts. The extent of the data can be nationwide extent or only available for some of the federal lender. And there are different sources
due to the federal system. As you can see, there's data for analyze of land use or natural environment, special conditions and other data. The ICS integrated administration and
control system is a very detailed data on field scale in a vector format and it combines GIS data with land use and land use related information. But it's only available for areas for which subsidies have been paid. And outside the setting, we got the digital basic landscape
model for Germany. On the slide, you can see the processing workflow and the used software products because of the different data models, the processing. Yeah, there are two parts
like the vector processing and the raster data processing. At the moment, the data is stored in a PostgreSQL database with a post GIS extension. And as because main data or most
of the data is available as vector data, we use a spatial overlay of all vector data as the main database. The raster data processing is done in GRASS. And after all this processing is finished, we link the vector data to the raster data.
And as you can see, the additional data down here. Okay, that's not good. Okay, the additional data, like the situation of biogas plants in Germany is integrated into
the database after geo-referencing and all the results of all data processing we use for our statistical analysis in SAS. As already mentioned, we use a spatial overlay of all vector data as
data format, as a main database. And as a result, we got area sums calculated for the geometries for our statistical analysis of land use change. And therefore, the data has to meet several requirements. The data sets must be complete within system boundaries. So the German,
the boundaries of Germany or of the federal land, there should be no larger overlapping, the geometries should be valid. And the transformation of single polygons and
multi polygons is also integrated in our data preparation. Okay, now I will hand over. Complicated. Thank you. So now, I would like to go to the results of our GIS toolbox. And
the first slide is about the vector processing and the vector pre-processing.
The first step is the plausibility check. We look at the data, the structure, the attribute data. I have a rough visual examination of the data for general understanding. Then we have to re-project the geometries for homogenizing of the coordinate system.
And we repair all invalid geometries. And then we check for overlappings in our input data. And we do this with a self-intersection. And depending on the data we have,
this step together with the elimination of the overlappings can be very time consuming. At the end, we convert all geometries into single polygons, said the wrong way, and
create a unique ID for identification after the overlay process. This is a slide about our routine regarding the overlapping geometries. You see some examples for overlappings occurring in our data sets. And we have several
criteria for the classification of the overlappings for picking the instruments for the elimination. And first is area and shape, which could be a parameter for the identification of sliver polygons. Next is the proportion of the intersecting area
in relation to the original polygon area, which enables us to identify duplicate or almost duplicate. Then we have attribute data, which are relevant for the understanding if
overlappings are intended, as the case of zonal structures of a water protection area, or if they contain similar content and can be aggregated. As instruments for the elimination, we have delete for duplicates, the clip for zonal
structures or sliver polygons. And we can union geometries as we aggregate them. In the better case, we use an allocation table to relate the new geometries to the old ones.
This slide is about our core process, the vector overlay operation. You see as an example, two data sets. And what we want to do is to integrate them into one combined result, where all outlines of the polygons have to be preserved and the resulting
geometries keep the information from the source polygons. We do this mainly with the overlay union function in ArcGIS. And we control this with this Python script, because we have to cut our data into smaller chunks. And because we have this massive input data, we have to adjust our pattern script for every federal land,
because of the differing input data for each land. In the end, we import the results into our PostgreSQL database and put all parts together
and do an area calculation. And this result we get back into a text file for SAS analysis. Here is our raster data processing together with the raster overlay process.
Since we used PostGIS 1.5, there wasn't any raster processing. Therefore, we decided to use GRAS. So we have to import all our data into GRAS and
calculate slope and exposition maps for the elevation model. Yes, we tried to have a direct overlay of our vector data with the raster data, but it
wasn't possible because of the massive data. So we decided to use a point raster as the basis for the integration. And what we do is we created a point raster of a 100 meter resolution with PostGIS. And PostGIS, we are doing an intersection of our vector overlay result
and query all polygon information. And in GRAS, we do the same query for the raster values, and then we can join the data based on the point IDs.
This is OK for most analyzed questions, but we also have the IACS data, which have very high resolution. And together with the elevation model with a resolution of 25 meters, we wanted a more precise approach. And therefore, we decided to use an alternative
overlay operation. And because we could reduce our vector data to the IACS vector data only, we were able to do univariate statistics in GRAS. And then we join the statistics again to the vector overlay result based on the IDs of the IACS geometries.
So this is our additional data we have. As Natasha already said, we have addressed data for biogas plants in Germany as an Excel chart. And we wanted to integrate
the location of the biogas plants into our database because biogas plants are regarded as a potential driver for agricultural land use change. And for the georeferencing,
we used the database with georeferenced address data from the Federal Agency of Cardigraphy and Geodesy in Germany. And after homogenizing formatting of our address strings, we had still problems with joining mainly because of this field with the street
address information. And as you can see, there is a combination of strings and numeric
characters together with delimiters and connectors. And sometimes there are only numbers or there is the street address inhabitant or only placeholders like this object without street. And all information is not included in our georeference database. And so
we were mostly only able to base our georeferencing on the combination of location and postcode. And only for 35% of the biogas plants, we could use the first characters of this street address field too as an additional information for joining.
In the end, we had a cluster of matching point for each biogas plant address. And what we did, we took the century of this cluster for the final point geometry.
Now I want to come to the discussion part of the presentation. Our research object was to handle this heterogeneous mass of geo data within an adequate timeframe. And we have to say that our approach is able to derive comprehensive GIS
data sets for the statistical analyzers. But we also have to state that this approach we are using is very time consuming and regarding the data acquisition, the pre-processing and
the processing part. So we have to think about changes in our toolbox and we have a few options. We think about for future handling. The first one, we would use the approach we are using now and using a fixed data set for a certain date and update only in very long
cycles just to minimize the effort. The second option would be a step back. You would only prepare the input data sets to the state that they are ready for the overlay operation. And the overlay itself would only be done for small data sets and tailored for
specific research questions. Or the third option would be based on the point approach we have used for the vector raster overlay. And it would be very easy to add new data sets into
our existing database then, but we still have the drawback of losing information. And in the end, I would like to evaluate the source software we used from our point of view.
And in general, I will say that the software we have used was very capable of processing the massive geo data within an acceptable processing time. For us, acceptable means something between hours and several days. And regarding PostgresQL,
we think it's a very great tool for the data storage of this massive data. And each member of our team has access to every data set they need without having to exchange
them. Then also the table manipulation processes are very effective. And last but not least, it has PostGIS. PostGIS is a very strong tool with many geo processing routines. And also the community is very amazing. We still have found some drawbacks in PostGIS for our
approach. And the main was the high accurateness of some functions, which resulted in geometrical artifacts or topology exceptions.
Yeah. Grass has shown itself as a very efficient in raster processing. But for us, the human export was a drawback. And also, we could say that
if there are improvements in usability or documentation, we think that the capabilities of grass could be better accessed.
Yeah. And with this, I would like to finish our presentation. We'll be open for questions or discussion. Thank you.
Was there any particular reason why you're using grass instead of ArcGIS? Yeah. We tried to use ArcGIS 2 just to minimize the number of software we had to use.
But for our massive data, it wasn't really working, I have to say. So grass was the only solution for us. Yeah. I have the opposite question,
which is why we're using ArcGIS for the unioning and intersection. Yeah. Okay. We're using PostGIS, right? And we're using PostGIS. Yes. So why not just do it right in there? Yeah. We tried. It's not working for us. No. Yeah. We developed a P-G-S-Q-L routine for doing the same thing as we do in ArcGIS.
But yeah, the problem we have, it's not able to work with more than two datasets at one time, which ArcGIS is able to do. And also, it's much more slower, I have to admit.
So probably, yeah, with the topology, the new topology, at the onset, probably there will be changes in the future, but I can only speak for the current.
Sorry. Well, you've been doing two options for the unioning. I was wondering how important SAS is to your workflow. And could you not union it in something like R,
which has very strong master vector unioning? Yes. I don't know, because the statistical analysis is not done by the people who do the GIS part. And we have to deliver a project our colleagues are able to work with.
So SAS is the software everyone uses at our institute.