Advantages of open-source GIS to improve spatial environmental modelling
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 45 | |
Author | ||
License | CC Attribution - NoDerivatives 3.0 Germany: You are free to use, copy, distribute and transmit the work or content in unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/21749 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | |
Genre |
15
23
00:00
Graph (mathematics)SoftwareModel theoryFreewareGraph (mathematics)Open sourcePresentation of a groupTelecommunicationBounded variationOpen setConnectivity (graph theory)Integrated development environmentLecture/Conference
00:40
Computer programmingRepresentation (politics)Model theoryFigurate numberSoftware developer1 (number)Parallel portExploratory data analysisObject (grammar)Level (video gaming)Different (Kate Ryan album)Variable (mathematics)NumberPolygonRegular graphUniform resource locatorRaster graphicsComputer fileWebsiteHierarchyUniqueness quantificationView (database)Integrated development environmentProcess (computing)Fitness functionMultiplication signTraffic reportingSoftware frameworkCentralizer and normalizerSet (mathematics)WindowMultiplicationError messageDistribution (mathematics)StatisticsDirection (geometry)Pattern languageSampling (statistics)Data analysisAutocorrelationSource codeGrass (card game)Virtual machineCategory of beingCausalityCartesian coordinate systemGraph (mathematics)Water vaporExecution unitPrice indexMathematical analysisParameter (computer programming)Equaliser (mathematics)WordGame theoryMereologyGame controllerCross-correlationRight angleCASE <Informatik>Lecture/Conference
06:37
Execution unitModel theorySource codeData modelLevel (video gaming)PredictabilityDifferent (Kate Ryan album)Bayes-EntscheidungstheorieEndliche ModelltheorieError messageCategory of beingFehlererkennungCurvilinear coordinatesPoint (geometry)Virtual machineAngleComputer virusPartition (number theory)Mechanism designFunction (mathematics)WindowDistribution (mathematics)Image resolutionVariable (mathematics)Generic programmingMathematicsPatch (Unix)NumberDisk read-and-write headBuildingSampling (statistics)Helmholtz decompositionAuthorizationRepresentation (politics)Data managementReading (process)Vulnerability (computing)Maxima and minimaComputer programmingTerm (mathematics)Figurate numberAreaGroup actionIntegrated development environmentObject (grammar)Graph (mathematics)Arithmetic meanValidity (statistics)VariancePolygonStatisticsSoftware developerRegular graphAlgorithmLine (geometry)AdditionAutocorrelationUniform resource locatorMedical imagingHierarchyComputer fileGradient descentView (database)Presentation of a groupSystem callComputer fontRight angleMereologyCondition numberSoftware frameworkText editorSimilarity (geometry)Content (media)Formal grammarMeasurementReverse engineeringMoment (mathematics)FrequencyMathematical analysisWebsiteMultiplicationAverageWater vaporMultiplication signCASE <Informatik>Functional (mathematics)Food energyCross-correlationMassSound effectInterpolationResultantState observerBitComputer simulationRule of inferenceSet (mathematics)Table (information)Propagation of uncertaintyOpen sourceTime seriesShape (magazine)Module (mathematics)Archaeological field surveyEmailPerfect groupDemosceneTotal S.A.Grass (card game)1 (number)Raster graphicsSlide ruleCalculationParallel portSummierbarkeitFlowchartClassical physicsCentralizer and normalizerPixelTesselationDistanceObservational studySmoothingGraph (mathematics)Cartesian coordinate systemGame controllerFlow separationResidual (numerical analysis)Heegaard splittingoutputTouchscreenPlotterCuboidPairwise comparisonExploratory data analysisDiscrete element methodHill differential equationAutocovarianceProcess (computing)Computer configurationInterface (computing)Temporal logicLecture/Conference
Transcript: English(auto-generated)
00:09
So welcome again to today's session and today's conference day. I would like to open the session Graph Open Source and Free Software. We have three talks about issues related
00:25
to graphs being open software and the relationship to environmental modeling. As first speaker, I would like to invite Scott Mitchell from the Department of Geography, University of Toronto. He will present advantages of open source GIS
00:43
to improve spatial environmental modeling.
01:14
Move over here.
01:27
In light of what I heard about yesterday, I considered making a dangerous step of revising my talk last night and last minute and I resisted, although maybe it would have helped. You'll see that three of my figures don't seem to be compatible with these machines.
01:42
Two of them aren't that important. One is unfortunate, but we'll deal with it. I'm going to be talking about a number of tools that we use in our lab and for switch for me trying to gloss over the applications and talk about the tools and why we need them. One of these tools is something that I am just a user of,
02:03
not involved with the programming of. The other ones are things that I've inherited the source code and I'm trying to update for GRASS5 and figure out what to do next with them. The overall theme of what we're trying to do
02:23
fits into interfacing environmental models with GIS and figuring out how to deal with heterogeneity. So I've represented that up here. On the left is a typical polygon view of the world
02:43
where you can divide everything up into unique polygons and everything is uniform within that polygon. In the middle, I'd like to show the same view, but there's variability within each polygon, except that in this case, the variability is the same in every polygon.
03:02
And then over here, the idea that there might be more to the pattern. There might be different variability in different places. So there's a spatial auto correlation and it might not be isotropic. So there could be some kind of directionality as in this polygon down here. This is here to remind me to mention aggregation error.
03:20
It's too small, you can't steal. I'll come back to that in a minute. So we've set this up as a framework for what I'm going to be talking about. When we're trying to do environmental modeling, we're coming up with a way to have a simplified representation of environmental processes.
03:41
And there's some kind of relationship between the spatial pattern on the ground and the processes that are operating in the environment. And we need to try and represent this spatial pattern somehow in the modeling environment. In this room, probably if you're a grass user, you're probably usually use raster data.
04:01
And this is meant to represent raster grid representation of the world. So you've got all this data in raster, but for modeling, usually we want to set up a coarser modeling unit, something like this. So we've got a bunch of tools to be able to move back and forth and deal with different levels of spatial heterogeneity.
04:23
First of all, I will be talking about, I've put here ESDA for exploratory spatial data analysis. I'll be talking quickly about a tool called r.samp that we've been using for many years. And then these are two examples, r.watershed and r.quadtree are examples of tools
04:42
that you could use to create model objects from more detailed spatial data. Of course, r.watershed is certainly not new or mine. I won't be talking about that anymore. But once you have these units, you're setting up your model representation of the world and you can make predictions.
05:12
So first of all, r.samp is a sampling tool that we use. The idea is that we want to be able
05:23
to characterize a data set by sampling particular locations. And again, in all of these examples, you'll see that it's just another example of the parallel development that we talked about yesterday. There are always other ways to do this. These met certain needs. We had a certain time within our lab and may or may not be useful in the future
05:44
given that there are other tools that could do the same job. But what we were after was a tool that could either use a sites file that already exists or create site files with different patterns, either a hierarchical sampling scheme
06:01
where you have a nested grid, a regular sampling scheme where you're sampling every X number of raster cells, or random sampling which essentially uses r.random to create sites at random locations. And then at each of these locations
06:20
in a variable sized window report what was found in either one or multiple raster layers and collect statistics on the, that's described the distribution that it found within that window such as central tendency or variability, or if there are multiple layers involved what the correlation or covariance are
06:41
between those layers. And then spit out the results just in a text table. We do this because it's easy to import into R for further analysis. So you've got spatial location, numbers in the middle here are the size of the window that was used.
07:00
You can have multiple windows and different increments between the window sizes. And then in this case, it's reporting to the averages within each of those windows and then variance goes off the screen. Again, just for a quick, simple, or perhaps overly simple example I just created
07:20
using r.random.surface. Normally distributed map, and then two with different levels of autocorrelation. And of course, these are window sizes going up on the x-axis and box plots of what was found at the samples across that landscape.
07:41
Obviously for a real example, you could characterize again as an exploratory data tool, characterize landscapes. We've used it, for example, to compare interpolations of DEMs. For my next example, I'm talking about, I worked a lot with how setting up
08:03
your modeling environment in different ways changes the uncertainty in your predictions. And so very quickly and oversimplified, of course, when you're setting up a modeling unit that is aggregating finer level, finer detailed data,
08:22
you're doing some kind of aggregation and depending on the relationships that you're modeling, this may or may not introduce aggregation error in your prediction. Have some kind of curvilinear relationship in the processes, then when you're aggregating,
08:42
taking some kind of central tendency, you will introduce some kind of aggregation error. So this figure may be familiar to some of you in modeling. It's taken from a classic paper by Rastetter and colleagues. The point here is that you're going to have some kind of aggregation error no matter what you do. So what you wanna try and do
09:00
is not just throw up your hands and give up, but to try and manage this error. Here's the first example of something that didn't work. These machines, for some reason, it's supposed to be a pretty picture of different kinds of partitions that are common. You're used to them watersheds, almost showed up here. I've got a regular grid, there's another example here.
09:23
A map from some kind of knowledge of the site was my other example, a vegetation survey map in this case. And then down here was a nice quadtree map that I'll be talking about a little more. So let's move on to what we can see. I tested this on a generic Windows machine at home
09:41
that had nothing special installed and worked fine, but oh well. So our quadtree tool is to have a variable resolution data model. So it's still an arbitrary shape just like the raster grid, but the size of the grid is variable across the scene. And this lets you concentrate, put more effort
10:02
or more resources in terms of resolution into the areas where you need it and less where you don't need it. So the first step of the program is to build up a pyramid, a very wasteful representation, a pyramid of all the possible representations. We'll show you that in a minute. And then you can have different decompositions
10:21
of that scene building up different possible representations using either the total number of units or the amount of residual variance that is not explained as your criteria to where you stop decomposing that pyramid.
10:42
Put it the right way. So the first step is building up that pyramid. So if this, for example, was your full resolution data set is going to build up everything right up to taking the mean of the whole scene. So that's what I mean by pyramid. So in each of these, six here is the mean of the numbers
11:01
in that quadrant, for example, in this simple example. And then by decomposition, we mean building a possible representation. So the decomposition will go through
11:20
and starting at the top, that's what we call the top down back in one of the first figures, we look around, this is the pyramid of the mean values underneath. There's also a pyramid of the, some measure of the variance underneath each of these tiles. And it goes through and it says, okay, underneath here, which has the most variance in it?
11:41
First of all, I can't really make a choice, flipped it into four. And it looks at all these, how much variance is, which one of these has the most variance underneath it? Now let's split that into four. And you start building up the variable resolution data structure. And again, as I said, you can stop when you reach a certain number of leaves, leaves is quadtree terminology for the tiles,
12:03
or you can constrain it by variability. So just some quick examples. Again, I took the top two or the same themes I showed you for the sampling example, just a random scene or an auto-correlated scene. This is a smooth with a moving window. And this was actually a little piece
12:21
of an NDIVI image for my study location. So I just ran the quadtree algorithm on here quickly. In the first case, I've told it I want it to come up with a partitioning that uses 128 tiles or 128 units to represent the area.
12:42
So in each case it's gone through and using a steepest descent algorithm on the variability, it's come up with one possible representation. So the yellow lines are just over top of the original data set showing how it would divide up that area. And so that could be used for modeling units.
13:01
The graphs show how the remaining sum of squares, so the remaining variability goes down as you increase the number of units. Completely random, of course, nothing much happens more than you put in. You don't really explain much of the variability.
13:21
Smooth case descends very quickly. This comes out somewhere in between. NDVI, it descends slowly. The other way, of course, that you could do it is to don't have a fixed number of units, but there we go, say that you're only willing
13:41
to live with a certain amount of variability. And then we get much different results, for example, for the smooth image, it got that variability down with only 25 units. With the auto correlated, it could do it with 73 NDVI, 196 random.
14:03
It really can't do much with random, but it has to get it down with 2,048 units. So at least it's a little bit better than the full resolution, which is 128 by 128. Hey, moving on very quickly to my last example, this is a tool a little bit different.
14:22
Here, I'm going to show what we do to build up spatial data sets for a particular environmental model, RESIS. And RESIS uses a hierarchical view of the world, divides the world up into, I've simplified a little bit, but into patches, hill slopes, and basins.
14:42
This figure I used in the paper, but I think it's a bit misleading because you may not realize that it's hierarchical. These are not separate. It is actually hierarchical. So every patch is within one of the hill slopes, which is within one of the basins. And you can build these up using whatever layers
15:00
make sense for your site and for the processes that you're modeling. RESIS actually is modular with respect to which sub models you use within it as well. We don't have time to go into that, but for example, top model is one option for rooting the water through the scene and it has a bunch of assumptions
15:21
and that impacts what makes sense in terms of your building up the world. If this was just a flow chart of RESIS, we'll have to skip unfortunately. So graphs to world is a tool that takes graphs, data layers and builds a representation of the world
15:42
for the RESIS model. So a very specific tool. Again, I've been skipping the fact that I'm supposed to be talking about the open source advantages I just realized for the previous tools. Here was a tool that was built for a different use and we were able to get the source code to it
16:00
and bring it in and make it work for this application. And again, this is missing one of the figures. It was just a time series of outputs, but the idea is I build multiple possible representations of the world for the model, make the predictions and compare the effect of different levels of resolution
16:20
or different amounts of temporal detail and so on. And here's another tool we use to analyze that. R.polystats, I've called it. I realized through Roger's talk yesterday that the GRASS-R interface may be another way to do this more directly and may be better
16:41
for our purposes and after this was developed by us, R.statistics showed up in the GRASS distribution, which does something almost identical. So this is a perfect example of parallel development. Now this is really important, it didn't come up, but I've got spots in R of how R.polystats essentially,
17:05
if you're not familiar with R.statistics, it's those same kind of thing. Underneath a polygon layer, you can calculate statistics such as mean and variability and so on of a continuous layer underneath those polygons. So for each of my modeling units, for each of my patches that I've decided
17:22
to represent the world, I can say, well, how much aggregation were we doing underneath that? How much aggregation of elevation? What was the mean and variance of elevation? And see how different windows, different numbers of patches that were chosen to represent the world changes that distribution.
17:40
And then we can do the same thing on the output, looking at aggregated outputs from the predictions. And this figure, it's not so bad, but at least this figure is in the paper that's in the proceedings. And I've got lots more examples like it. Stay on time, I'll just sum up with the idea
18:01
that handling heterogeneity in the environmental modeling is crucial because if you ignore it, you're going to get the effects anyway. So it's best to try and estimate how much this is causing troubles for you and then you can try to minimize it as much as possible. What I see as the open source advantages where the exchange of ideas with conferences is a perfect example of that,
18:21
but also the mailing list. The fact that you can see how other people have solved the problems in their source code, adapt that for yourself. We are able to take R.random for example, the source code, read that and properly credit the authors of that as we use pieces of it in our sampling tool.
18:44
And experimenting with the fact that when we're dealing with these modeling units and we want to come up with new ways of representing space, we have a base framework in which we can do these experiments, change how the world is represented and not have to be constrained by the spatial model
19:01
that are in a commercial GIS package for example. Thank you. Thank you for this interesting presentation. Other questions?
19:33
Can you summarize again which are the names of the modules that you developed, where you took the examples from?
19:41
I didn't see all the names in the slides so I thought I just might ask again. That's right, I found myself slipping into what I usually talk about and not what I meant to talk about here.
20:23
As I say in the paper, there are various levels of development here. Which ones actually work in GRASS5, which ones are on their way and which ones can deal with GRASS data but are running outside. R dot stance, we've got working in GRASS5 and the status of that is,
20:41
I'm interested to see what's going to happen with changes in sites. Sites are disappearing and then maybe we reevaluate how that's going to happen. They're not gonna disappear. Okay, well, that's something I'm gonna keep working on and we like. Off to the side, but having to do with this figure, what we call R dot poly stats
21:00
and that's working in GRASS5 as well but is very similar to R dot statistics. In fact, the difference is that it can go across multiple layers instead of just looking at one layer. The similarity of both of them, as far as I know, are constrained to integer maps, not photos. And that's the one that I have to investigate. Maybe it's, especially since I'm taking all my data into R anyway,
21:22
maybe it's easier just to use T apply on everything. R dot quadtree is for building quadtree representations from raster grids. And at the moment it's separate programs, especially since we build these huge pyramid files.
21:41
It's not efficient at all. It's a true research tool. And I have to, I'm trying to figure out whether I do make that a full GRASS module or not. If I do, it has all these extra files that are not GRASS files. But R dot le, for example, also does have increased level, separate properties and that's important.
22:02
Probably all of them. I don't think I can do that. Oh, and GRASS to world, finally, is very specific to running the Rhesus model, but that was for taking all the GRASS data layers and coming up with the huge input file that Rhesus needs and allows you to set up these different hierarchical representations of the world.
22:30
Adebong, I have a question regarding your sampling design, especially your regular sampling design. Can you finish off the slide?
22:42
Based, if I understood you correctly, you told that you understand the regular sampling design in a sense that you are picking up every n pixel from the data set, correct? That, I mean, that approach leads to the effect that
23:04
the distances between the selected pixels or points doesn't have to be necessarily even or the same. So I would like to ask you if, what was more important for you to have a select, to select every n pixel from the data set
23:22
or to have the distances within the selected pixels the same? In that particular example there, it was actually something that somebody else requested from me and this is the way they wanted it and it's clear, it's not what, say if you're going on doing sampling on the ground
23:41
or I don't know, plant ecology or whatever, it's not the same kind of regular sampling. And this is what he wanted for that particular thing. So again, this was a research tool thrown together for a particular purpose. And I can see that a more true regular sampling would also be very useful.
24:12
Ask you if you have used this course, representation of data in some environmental model
24:22
because in the presentation was problem of sampling and introducing the error and control of this error. If we use some error propagation model. In my own work here, I mean, this was lashing together some stuff
24:42
from all three of the co-authors. In my own work, the uncertainty I've been looking at for the example I put up there is just putting it up and showing how the results change. There's so many outputs from this model. It's just been a descriptive qualitative thing, making it coming close to the few observations
25:01
that we do have. Something that I've done similarly in a previous application with another model was using Monte Carlo simulations to build up a distribution of possible outputs and so I can build up an error prediction of the uncertainty by doing multiple runs.
25:21
But this model was so big and took so long that that wasn't really feasible. It took, for the big ones, two days just to do one run. So if there aren't any further question,
25:41
I think, thank you again.