We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Comparing spatial patterns in raster data using R

00:00

Formal Metadata

Title
Comparing spatial patterns in raster data using R
Title of Series
Number of Parts
156
Author
Contributors
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Spatial pattern is an inherent property visible in many spatial variables. Spatial patterns are often at the heart of many geographical studies, where we search for existing hot spots, correlations, and outliers. They may be exhibited in various forms, depending on the type of data and the underlying processes that generated the data. Here, we will focus on spatial patterns in spatial rasters, but the concept can be extended to other types of spatial data, including vector data and point clouds. Patterns in spatial raster data may have many forms. We may think of spatial patterns for continuous rasters as an interplay between intensity and spatial autocorrelation (e.g., elevation) or between composition and configuration for categorical rasters (e.g., land cover) (Gustafson, 1998). Intensity relates to the range and distribution of values of a given variable, while spatial autocorrelation is a tendency for nearby values of a given variable to be more similar than those that are further apart. On the other hand, composition is the number of cells belonging to each map category, while configuration represents their spatial arrangement. Another distinction is between the data dimensionality. The most common situation is when we only use one layer of given data (e.g., an elevation map or a land cover product for one year). However, we may also be interested in sets of variables (layers, bands), such as hyperspectral data, time series, or proportions of classes. An additional special case is the RGB representation of the data. Assessing the similarity of spatial patterns is a common task in many fields, including remote sensing, ecology, and geology. This procedure may encapsulate many types of comparisons: comparing the same variable(s) for different areas, comparing different datasets (e.g., different sensors), or comparing the same area but at different times. Given various possible scientific questions and the fact that we have a plethora of forms of spatial data, there is no universal method for assessing similarity between two spatial patterns. The basic method is a visual inspection; however, it is highly subjective, both from the observer's and visualization type's perspectives. Fairly straightforward other approaches are to create a difference map, count changed pixels, or look at the distribution of the values. More advanced methods include the use of machine learning algorithms. However, these methods are often complex, require a lot of data, and are not always interpretable. An alternative and general approach, inpired by content-based image retrieval (Kato, 1992), is to use spatial signatures to represent spatial patterns and dissimilarity measures to compare them (Jasiewicz and Stepinski, 2013). A spatial signature is any numerical representation (compression) of a spatial pattern. For a categorical raster, it can be a co-occurrence vector of classes in a local window, while for a time series, it may be a vector of values in a given cell. Then, having spatial signatures for both areas (sensors, moments), we can compare them using a dissimilarity measure (e.g., Euclidean distance, cosine similarity, etc.) (Cha, 2007). This approach can compare complex, multidimensional spatial patterns, but at the same time, it gives some degree of interpretability. It can also be further applied to many techniques of spatial data analysis, including spatial clustering (to find groups of areas with similar spatial patterns) and segmentation (to create regions with similar spatial patterns). While the concept of applying spatial signature and dissimilarity measures is powerful, there are still many unresolved issues and questions to consider. It includes the topics of scale of comparison, input data resolution, dimensions, or types, used spatial signatures, and selected dissimilarity metrics. There is still a lack of studies that systematically compare different methods of assessing similarity between spatial patterns, or suggest good practices in their use. At the same time, a growing number of FOSS tools allows us to test various methods and apply them to real-life scenarios. The goal of this work is to provide an overview of existing R packages for comparing spatial patterns. These include 'motif' (for comparing spatial signatures for categorical rasters; Nowosad, 2021), 'spquery' (allowing for comparing spatial signatures for continuous rasters), and 'supercells' (for segmentation of various types of spatial rasters based on their patterns; Nowosad and Stepinski, 2022). It will show how they can be applied in real-life cases and what their limitations are. This work also aims to open a discussion about the methods for assessing similarity between spatial patterns and their FOSS implementations. References Cha, S-H. (2007). Comprehensive ...
Keywords
127
Pattern languageScale (map)PredictabilityObservational studyAssociative propertySimilarity (geometry)Data modelPairwise comparisonStatistical dispersionPersonal digital assistantDependent and independent variablesDifferent (Kate Ryan album)Pattern languageMultiplication signMathematicsPhysicalismBitEndliche ModelltheorieUniform resource locatorFlow separationCategory of beingCombinational logicState of matterRaster graphicsScaling (geometry)Similarity (geometry)Goodness of fitPredictabilityIntegrated development environmentOrder (biology)Associative propertyLecture/ConferenceMeeting/InterviewComputer animation
AreaVariable (mathematics)Multiplication signOperations researchDependent and independent variablesDifferent (Kate Ryan album)Pairwise comparisonFinite differenceNetwork topologyMultiplication signAreaComputer animation
Multiplication signPattern languageSlide ruleRaster graphicsComputer animationLecture/ConferenceMeeting/Interview
Computer-generated imageryCovering spaceRouter (computing)Visualization (computer graphics)State observerSimilarity (geometry)Perspective (visual)Continuous functionMorley's categoricity theoremRaster graphicsDifferent (Kate Ryan album)Raster graphicsPoint (geometry)PolygonFlow separationData typeDimensional analysisUniform resource locatorStability theorySingle-precision floating-point formatMereologyFocus (optics)Covering spacePairwise comparisonMorley's categoricity theoremLine (geometry)Level (video gaming)Greatest elementGraph coloringSimilarity (geometry)Analytic continuationDependent and independent variablesMultiplication signPresentation of a groupMathematicsSlide ruleMachine visionData modelStudent's t-testCategory of beingGroup actionNumberCASE <Informatik>Observational studyCore dumpComputer animationLecture/ConferenceMeeting/Interview
AutocorrelationMathematicsContext awarenessRaster graphicsMorley's categoricity theoremContinuous functionBinary fileCoefficientCross-correlationMetric systemFunction (mathematics)Similarity (geometry)Price indexEntropie <Informationstheorie>SurfaceAverageSquare numberError messageRootElectronic signaturePairwise comparisonMotif (narrative)PixelStatisticsAssociative propertyDifferent (Kate Ryan album)AreaMetric systemThree-dimensional spaceCovering spacePixelContext awareness2 (number)Multiplication signObservational studyMoment (mathematics)Category of beingDimensional analysisPearson product-moment correlation coefficientMathematicsCross-correlationGreatest elementWindowSingle-precision floating-point formatPairwise comparisonTable (information)Slide ruleRaster graphicsLevel (video gaming)Entropie <Informationstheorie>Compilation albumMappingInformationCellular automatonPattern languageMultiplicationDifferenz <Mathematik>CalculationResultantTerm (mathematics)Right angleMatrix (mathematics)Computer configurationSeries (mathematics)Neighbourhood (graph theory)Inclusion mapMereologySimilarity (geometry)Analytic continuationLecture/ConferenceMeeting/InterviewComputer animation
Entropie <Informationstheorie>Similarity (geometry)Distribution (mathematics)Boltzmann constantStatisticsContingency tableHausdorff dimensionDressing (medical)Set (mathematics)Raster graphicsPairwise comparisonCAN busSoftwarePhysical systemFuzzy logicCohen's kappaComplex (psychology)WaveletPrice indexInterface (computing)Range (statistics)Axiom of choiceCharacteristic polynomialPattern languageCategory of beingSpacetimeDecision theorySoftware testingCurve fittingVariety (linguistics)Task (computing)CodeContext awarenessSimilarity (geometry)Subject indexingMetric systemSoftwareQuadratic formField (computer science)Different (Kate Ryan album)Moment (mathematics)Slide ruleSeries (mathematics)Distribution (mathematics)Multiplication signMultiplicationCalculationData compressionSource codeRange (statistics)Electronic signatureVector spaceOnline helpFlow separationCategory of beingReal numberPattern languageCASE <Informatik>Repository (publishing)Selectivity (electronic)AreaCodeObservational studyPrincipal component analysisHistogramGreatest elementGroup actionNumberMorley's categoricity theoremPairwise comparisonPopulation densityRight angleContingency tableTime seriesRaster graphicsEntropie <Informationstheorie>Connectivity (graph theory)Computer animationLecture/ConferenceEngineering drawingMeeting/Interview
Task (computing)Morley's categoricity theoremRaster graphicsContinuous functionContext awarenessWebsiteCASE <Informatik>Field (computer science)Observational studySoftwareCodeOnline helpPairwise comparisonParameter (computer programming)Lecture/ConferenceMeeting/Interview
Analytic continuationHydraulic jumpSubsetCASE <Informatik>Morley's categoricity theoremBuildingOpen setMultiplication signLecture/Conference
Computer-assisted translationMultiplication signPresentation of a groupProcess (computing)Lecture/ConferenceMeeting/InterviewComputer animation
Transcript: English(auto-generated)
Thank you. Good morning, everyone. So I'm Jakub Novosad. I'm an assistant professor at Adam Mitzkewitsch University in Polzna in Poland. And in my work, I do mostly that. So I work using R. I work with raster data. And I'm obsessed with spatial patterns.
So I thought that this is the perfect combination for this kind of a conference. And that kind of a topic would force me to just spend some time just reading papers and trying to see what's the current state of this topic is. And just to start with definition,
not the best way to start, but at least we have an animation. By spatial patterns, it's defined as scale dependent. So we have patterns in different scales. It's physical arrangement. So we arrange something in some order.
And it's predictable. So there is the idea that it's not random. There is something there that we can predict. So you can see that already you see that the spatial patterns, they have some properties that we hope to capture because they matter.
And there are several reasons, several basic reasons to do that. So the Long and Robertson, they state that we can study change. So we have time A, time B, and I want to see if there is a change in spatial pattern that happened from the time A to time B.
But we can also study similarity. We have location A and location B. And we want to see if those two locations are similar to each other or different. Then we can study association. So then we can think of variables. We have variable A, variable B, and we try to see if they are similar.
And another maybe a little bit different idea is the assessment of spatial models. So we have our outcome variable, we have our predictors, and we want to maybe assess our model, we want to compare them, and so on. And because of this topology, we also
can think of different operations. So we can think about comparing the same variables but for different areas. We can compare different data sets or different sense or different variables. Or we can just see the same area but at different times.
So the whole idea of my talk and the paper that was mentioned before was to just provide an overview of our packages to compare spatial patterns just focusing on rasters. Because probably it would be much too much
to say if we go to other types of data. But to do that, I needed first to take a step back and I started by looking at methods for comparing spatial patterns. So in the paper, you can read about more than the things
I will show you in the next few slides. So let's look at some example data that will be shown during the presentation. So here we have two main types of raster data. So at the top, you have continuous raster data.
This is an NDVI values. And to show you the variability of those methods, I use data from the same place but different times. So you have NDVI from 2018 for TAR2, NDVI from 2023 for TAR2.
And you have NDVI for 2023 but for different locations. So from city of Poznan, where I'm based in. And at the bottom, we have a similar situation but with categorical data. So we have land cover data, so data with here we have seven different land cover categories. And you can see also that we have the same location
for different times. So we have TAR2 in 2000 and 2018. And then we have Poznan in 2018. And those data sets, I believe they will show us the differences and similarities between those methods. So when we go to the core of my talk, so to the methods,
the first method probably that we already, I hope that some of you already done, it's a visual inspection. We look at the data and we try to think about the differences. We look at the data on the top and on the bottom. And you can see that the map on the top,
the colors are darker. So they are higher NDVI values. We already see, especially maybe the southeast part or maybe northeast part, which is totally different from the same place in 2023. Similarly, we can see that on categorical rasters when you can see that there is visible urban growth
between 2000 and 2018 for TAR2. So human vision is great because we, at the same time, we can see different dimensions, different changes, different aspects of what we can observe.
But of course, there is one really huge problem that we are subjective. So probably if I would ask all of you to give me a number between 0 and 10, how similar they are, I will give different numbers. And I know that because I've done those kind of studies before. I asked a group of 50 students about that
and I've got values from 0 to 10 for some cases. So that's very problematic because we cannot reproduce, we cannot have stable values at the end. And of course, another thing is that it can be time consuming. So if we need to go through hundreds of thousands
of data sets, it's probably not doable with just visual inspection. So when we also think about the methods, we also need to think about several dimensions. And I mentioned one dimension at the beginning,
so the data type. So data type here, I'm talking about continuous categorical, but we also can think about data models. So we can think about comparing points and polygons and lines. I'm focusing on raster. Then we can think about data types. So there will be different methods for continuous and categorical rasters.
Then we also need to think about what's the outcome of our comparison? And here you can see that I talk about maybe three basic outcomes that we can have. So one outcome, probably the nicest visually, is the raster outcome.
So we compare TAR-2 in 2018, 2023, and at the end we've got a map. So this is the raster outcome of our comparison. Second option is the single value outcome. So we compare the same area, let's say, and at the end we get one value.
And this one value describes how those patterns compare to each other. And the last option is to have multiple value outcome. So we've got series of values describing the change, hopefully less values than we have cells in our rasters.
Third dimension is the context. So by context I mean, do we include information about spatial relationship or not? Because, for example, if we just may calculate the, like on the left, NDVI change.
So basically I take the data for one year and subtract that from the data from the other year. So basically it's just calculations, each cell is calculated independently, so there is no spatial context included. So this is the non-spatial part. But on the right you have a measure
of spatial correlation of the differences. So then you are actually looking at the neighborhoods and you are including the spatial context in your results. And the last dimension is, can we compare disjoint areas? So for some methods you can only use them if you have data for the same area.
But some methods that I called disjoint methods, they also work, and we can compare, for example, TART with Poznan. So, and now maybe let's go and see what we have here. So the most, probably the most basic ideas are here.
So those are the ideas when we have the raster outcome. So at the end of our calculation we still have a raster. And for continuous rasters it's basically the difference between one moment in time and the other. So you can see that on the top right corner
when we have the change in NDVI and we can basically see that there is a decrease so the values in the second moment in time are lower than the first. But for categorical data, we cannot do that. We cannot subtract land cover categories.
There are a few things we can do. One, probably the most basic thing is just to calculate the binary difference. So if there is a difference or not between those two dates, and you can see that actually the differences are not clamped in one area
but spread all over the study area. Which is probably some kind of information that we can maybe use for some purposes. Then, if we think about adding the spatial context
but we are still wanting to have the raster outcome, we can use other methods. So here I'm showing you three methods. You already seen the first one. So basically we just calculate the differences and then we are calculating the spatial correlation of those like i-moron and similar.
And then we can see areas with the clamped changes. We can also calculate correlation coefficient using moving window. So we are basically, there is a moving window that calculates the differences and it also highlights which areas in that moving window change the most
and which are correlated the most and which are correlated less. So you can see that basically when the correlation is lower than zero, it basically means the change, the difference. So the violet areas in the middle are basically the areas of change.
But we also can do much more. So we can calculate the different metrics on our data. So for the first moment in time, for the second, one idea is to calculate just different methods using co-occurrence matrixes
like homogeneity and entropy and so on. Those methods also are using moving windows and then we basically just calculate the difference between the moment, for the metric for the one year from the metric from the other year. And you can see that this gives us slightly different story
and I think this is probably one of the, this is maybe a spoiler alert but this is probably one of the main outcome what I wanted to show you that. Usually those methods will give us slightly different stories. So that was where from continues. Now let's go back to categorical rasters.
So for categorical rasters, we can do something similar. You can see that on the left. So we can calculate some landscape metric for the first moment in time. We can calculate that in a moving window for the second moment in time and then we can calculate the difference and you can see which area changed in terms of this metric.
So this is very important because you need to decide the metric. So you need to decide what's the properties of the landscape you are interested in or the change in the property and then we can calculate that. We can also use some other measures like cross entropy. It also uses the moving window.
So in this spatial context, often moving window is one of the main approaches. So here we have a short summary of that. I mentioned that I also want to highlight r packages so you can have r packages that are implementing those methods that you can use
and you can see also that on the tables there are a few more methods that I am showing on the slides. So let's maybe go to the next one. Single value outcome. And I don't know if you noticed but for the raster outcome,
we only look at TAR-2. TAR-2 in 2018 or TAR-2 2018 and TAR-23. Because this is probably the only way we can get a map as the result. But if you want to compare different areas, we cannot compare pixel by pixel
or moving window to moving window. We need to do something different. So this is where we can use single value outcomes. So here you can see that here we have the first example. This is for non-gejoint area. So here we are still keeping the same area
and we are just comparing them. So we can use different methods and at the end you can see at the bottom we get one value. So for example we get the proportion of changed pixels. So we know that that proportion is about 0.5. We get the value of overall comparison,
we can get some other measures like V measure which also calculates how similar two maps are to each other. But then what can we do if we have different areas? So we have different methods. And of course those methods are much broader because they can be used both for the same areas
and also for the other areas. So what I've done here is basically I put TAR-2 in the middle, so this TAR-2-2023, and I compared that with TAR-2-2018 on the left and PAUSE-9 on the right. And you can see that several measures,
for example I calculated the dissimilarity between distributions of values. So I just look at the distribution of values in TAR-2023 and compare that to other examples. I calculated average roughness and then also compared that. I also calculated something which for the purpose
of this talk I called Gauss entropy. So also I got two numbers which I compared the differences between. And what is really interesting here is that the NDVI in PAUSE-9-2023 is more similar to TAR-2023 than to TAR-2 in the past.
Which you can also see usually, I hope. But now try to do that for 1,000 areas or more. So let's think about categorical rasters. Let's look at that. So what we can do, we can for example calculate
a landscape metric for the whole landscape and also again calculate the difference. So I calculated the difference between shunt diversity index with edge density. I also used another entropy called Gauss entropy. And another idea is to use the spatial signatures.
So basically compress the spatial information and compare that using dissimilarity measure. And you can see the values also at the bottom. And here basically we can see that in most cases TAR-2 in 2018 is more similar to TAR-2
in the past than to PAUSE-9-2023. Which make also sense if you look at that just visually. And the last group is about multiple values outcome. So here I just show you the simplest one. So basically I just calculated or show you
the histogram of the differences. But we can think of many others. So a lot of methods I showed you before we can calculate for example few landscape metrics and then we'll have also a vector of values. Or we can calculate something called multiple scales. So this is what the way wiser package do.
Basically calculates different metrics on different scales and then we have a vector of values instead of one value. For categorical raster we can have the contingency table or there are other methods that we can use. We can also have some special signatures that we can compare.
So you know already some methods. You've seen how I classified them. You've seen some tools that I used. So the first thing that I wanted to show you now is we can extend it. So I just showed one moment in time
to another moment in time or one moment in time to another place in one moment in time. But we can extend that to multi-dimensional data. So there are a few ideas, simple ideas. We can do a pairwise comparison. So we can just compare each time step to let's say January to January, February to February and so on.
Or we can compress time series to one value or like doing PCA and just taking the first PCA component and then just compare one value. Or we can use some kind of more complex signature. So we can calculate multiple value for one moment in time,
multiple value for another moment in time and then use some dissimilarity measure to compare that. Going back to tools. Because this is a software conference so let's talk about tools. So what I've seen working on this paper and on this talk.
First of all, as you've seen, we've got a lot of packages, a lot of tools that allows us to do that. So already a lot of things are implemented but as you see on the slide, not all of them. So there are some methods that are not implemented
or sometimes they are implemented but not for spatial data and if you have spatial data you need to adjust your code and play with it for some time. Also, the whole efforts are rather fragmented which makes sense I think because people from different fields want to compare spatial patterns.
So they develop their own tools, they develop on their own methods and often they do not talk to each other so if you want to calculate five methods you need to learn five tools which is probably not the best. Now also not all of those packages are on the official repository on Suran
and some have very minimal documentation so I needed to spend some time to go to the source code or to examples and to maybe play with that to just calculate one value. And in general, I also think that there is a need for more tutorials, more documentation
and we need to improve on that. When it comes to the method selection, so going back to the methodology, there is probably an unlimited range of methods that we can have but we need to look
at those different problems before working on that. Gladly, now as you've seen, we already have tools so now we can actually maybe calculate few metrics and look at what their properties in case of our area
and how we can use that. Because what I discovered is that we don't have papers like comparing those methods. So there is a method A used in the field A, methods B used in the field B and they are disconnected. So I think there is still a lack of the systematic study
of those methods of their properties, of their pros and cons. So but with the tools I showed you today, I think there is a step forward that we can do. We can use those methods and apply them to real world cases and to compare them and look at their pros and cons.
So just to sum up, this whole work would not be possible without the help of several people I mentioned here. So I asked people on Phosodon and so on about their ideas, about their knowledge and they helped me tremendously. But what I wanted to say is that we have tools
that you've seen already and for this paper, I also prepared code examples. So if you go to my slide and to the code examples, you'll get two documents with the code, reproducible code showing how to apply all of those methods. So you can quickly adjust that to your data
or maybe do, you can do this systematic study and tell me which method is the best. Thank you. Thank you. Thank you. Thank you very much Jakub. Maybe let's do it the other way around. Everyone who has to go leave to the next talk,
feel free to go and leave and maybe there are questions in the room though for Jakub. If not, then I have one question regarding your summary. So what would you recommend to the community if somebody wants to get started and wants to tap into the field of R
and the field of offering packages and maybe also aligning packages? What are the practical steps you would say the first thing that needs to be done or what is that? For the person to learn how to do that
or to create new tools? Maybe to step in and help foster the field of spatial comparison. Yeah, so the first step is just to go to the code examples and look at that and get familiar with the code because as I said, different methods have different
and different tools have different assumptions, different parameters. And then you need to think about the way of how the systematic study can be done because I think in this case, I would just take a step back from the software and just think theoretically. Like how we can actually, what are the cases that we can use to do that?
And I still don't have complete idea what's the best here, how to do that. But I would take the step back. I would probably just take a piece of paper and start drawing to think about what can be, what kind of data sets we need, what kind of types of data we need. And but at the same time, as always, start small.
So start maybe, don't jump in categorical and continuous and all of those. Just start maybe in one of the subset of those methods I showed today and build from that up. And as always, I think that the good advice is to just work in the open.
So if you work in the open, show your work to other people and they will help you as you can see here. I, yeah, this presentation would be probably 10 times worse, let me speak. And maybe tag you on GitHub in the process. Yeah, of course. Okay, thank you very much, Jakub.