Design-based methods for assessing map accuracy
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Anzahl der Teile | 13 | |
Autor | ||
Lizenz | CC-Namensnennung 3.0 Deutschland: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/54877 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | |
Genre |
11
00:00
p-BlockStichprobeZufallszahlenStatistikMultiplikationsoperatorKlassische PhysikChatten <Kommunikation>MAPWahrscheinlichkeitstheorieDatenstrukturMinimumZählenStichprobenumfangSchätzfunktionComputeranimation
02:22
DatenstrukturFlächentheorieZufallszahlenFehlermeldungLogischer SchlussDifferenteStichprobenumfangFlächentheorieZellularer AutomatFlächeninhaltBeobachtungsstudieQuick-SortSchätzfunktionArithmetisches MittelDatenstrukturMAPVariableRandomisierungZahlenbereichBitMittelwertPhysikalische TheorieComputeranimation
05:12
StereometrieStatistische SchlussweiseStichprobeMAPBitLogischer SchlussRechenschieberStichprobenumfangEinsStereometrieStatistikCASE <Informatik>ZahlenbereichComputeranimation
06:07
Metrisches SystemSchätzungStichprobeSampling <Musik>UnendlichkeitPunktFehlermeldungZellularer AutomatWurzel <Mathematik>EndlichkeitWurzel <Mathematik>FlächeninhaltStichprobenumfangEinflussgrößeFehlerschrankePunktURLOrdnung <Mathematik>Zellularer AutomatQuadratzahlFehlermeldungFunktionalDomain <Netzwerk>MAPValiditätSchätzfunktionMittelwertCASE <Informatik>SystemaufrufVariableBitDifferenteComputeranimation
10:09
FehlermeldungFormale GrammatikPunktSchätzungStichprobeVariableMailing-ListePlot <Graphische Darstellung>Wiederkehrender ZustandDichte <Physik>Jensen-MaßDialektKartesische KoordinatenHilfesystemRahmenproblemMAPStichprobenumfangFehlermeldungKreisflächeWurzel <Mathematik>ZahlenbereichPunktSchätzfunktionUnendlichkeitDichte <Physik>DistributionenraumURLSoundverarbeitungMultiplikationsoperatorArithmetisches MittelTotal <Mathematik>PlotterQuadratzahlMittelwertBitRandomisierungComputeranimation
13:50
StichprobeCodeFehlermeldungSchätzungDistributionenraumVerschlingungPunktHilfesystemKartesische KoordinatenFormale GrammatikArithmetisches MittelVariableJensen-MaßPlot <Graphische Darstellung>Dichte <Physik>Schiefe WahrscheinlichkeitsverteilungWurzel <Mathematik>PunktChatten <Kommunikation>TermURLFlächeninhaltDistributionenraumStichprobenumfangAuswahlaxiomTouchscreenAbgeschlossene MengeGüte der AnpassungGraphTypentheorieMultiplikationsoperatorCodeMittelwertDifferenteSoundverarbeitungPlotterComputeranimation
17:19
StichprobeCodeFehlermeldungSchätzungDistributionenraumKonstanteCliquenweiteWurzel <Mathematik>QuadratzahlZufallszahlenStandardabweichungInverser LimesGruppenoperationAnalytische FortsetzungBitBereichsschätzungKategorie <Mathematik>CASE <Informatik>Wurzel <Mathematik>FehlermeldungQuadratzahlStichprobenumfangRandomisierungSchätzfunktionMini-DiscStatistikt-TestDistributionenraumArithmetisches MittelNeuroinformatikZentralisatorMapping <Computergraphik>Inverser LimesComputeranimation
19:52
Logischer SchlussSampling <Musik>Metrisches SystemSchätzungFehlermeldungInformationZufallszahlenDatenmodellStichprobenumfangBereichsschätzungCoxeter-GruppeRandomisierungDichte <Stochastik>Logischer SchlussMAPGemeinsamer SpeicherSchätzfunktionElement <Gruppentheorie>HyperbelverfahrenGüte der AnpassungPunktNeuroinformatikExpertensystemComputeranimation
21:44
StichprobeDatenmodellSchnittmengeHeegaard-ZerlegungLeistungsbewertungCodeTextur-MappingPrognoseverfahrenFlächeninhaltExtrapolationMereologieMerkmalsraumHeegaard-ZerlegungEndliche ModelltheorieGüte der AnpassungKreuzvalidierungLuenberger-BeobachterCASE <Informatik>Interaktives FernsehenBitRechter WinkelSchätzfunktionValiditätStichprobenumfangBildschirmmaskeMereologieAuswahlverfahrenVorhersagbarkeitMultiplikationsoperatorAbstimmung <Frequenz>MAPDichte <Physik>FlächeninhaltSchnittmengeLogischer SchlussStandardabweichungMapping <Computergraphik>Computeranimation
26:56
InternetworkingWorkstation <Musikinstrument>FehlermeldungStichprobenumfangRandomisierungMapping <Computergraphik>DifferenteKreuzvalidierungSoftwareCluster <Rechnernetz>CASE <Informatik>MereologieComputeranimation
27:54
RechenschieberAutokorrelationsfunktionDichte <Physik>GewichtungDatenmodellFlächeninhaltStichprobenumfangDichte <Physik>MereologieCASE <Informatik>Cluster-AnalyseKreuzvalidierungPunktValiditätPerfekte GruppeEndliche ModelltheorieMultiplikationsoperatorGewicht <Ausgleichsrechnung>AutokorrelationsfunktionNachbarschaft <Mathematik>TelekommunikationFehlermeldungComputeranimation
30:06
TelekommunikationBeobachtungsstudieTextur-MappingDatenmodellStatistikAttributierte GrammatikStandardabweichungAutokorrelationsfunktionProgrammierumgebungComputerspielGruppenkeimAbstraktionsebeneInformationVariableHomepageGerichtete MengeMailing-ListeInhalt <Mathematik>VorgehensmodellPhysikalische TheorieFehlermeldungZufallszahlenKlumpenstichprobeRückkopplungArithmetisches MittelSchätzungStichprobeBetrag <Mathematik>ZahlenbereichStichprobenumfangCluster <Rechnernetz>VerschlingungExtrapolationCASE <Informatik>QuaderFlächeninhaltProzess <Informatik>FehlermeldungWurzel <Mathematik>PlotterMultiplikationsoperatorGüte der AnpassungKreuzvalidierungLuenberger-BeobachterKartesische KoordinatenPuffer <Netzplantechnik>DefaultDatensatzVersionsverwaltungMAPBeobachtungsstudieArithmetisches MittelLogischer SchlussBitQuadratzahlPunktResultanteSchnittmengeGewicht <Ausgleichsrechnung>Stochastische AbhängigkeitWald <Graphentheorie>RadiusMittelwertSpannweite <Stochastik>TypentheorieValiditätFaltung <Mathematik>Kompakter RaumLeistungsbewertungEndliche ModelltheorieRechter WinkelMinkowski-MetrikAutorisierungWeb-SeiteStandardabweichungPhysikalische TheorieSchätzfunktionDichte <Physik>Technische Zeichnung
39:29
RückkopplungFehlermeldungArithmetisches MittelSchätzungInformationStichprobeBetrag <Mathematik>DifferenteFehlermeldungQuadratzahlSchätzfunktionArithmetisches MittelWurzel <Mathematik>Betrag <Mathematik>SummierbarkeitEndliche ModelltheorieMittelwertFlächeninhaltKrigingResultanteCASE <Informatik>SchnittmengeLuenberger-BeobachterXMLComputeranimation
41:44
InformationStichprobeFehlermeldungBetrag <Mathematik>RückkopplungSchätzungArithmetisches MittelDichte <Physik>DatenmodellCluster <Rechnernetz>CodeComputeranimation
42:36
RückkopplungSchätzungFehlermeldungArithmetisches MittelStichprobeInformationBetrag <Mathematik>DatenmodellGewicht <Ausgleichsrechnung>Cluster <Rechnernetz>Dichte <Physik>Endliche ModelltheorieTelekommunikationComputeranimation
43:36
RückkopplungDatenmodellGüte der AnpassungVarianzStichprobenumfangPunktschätzungNachbarschaft <Mathematik>RechenschieberDatenstrukturStellenringSchätzfunktionComputeranimation
44:37
DatenmodellStandardabweichungAutokorrelationsfunktionTelekommunikationProgrammierumgebungComputerspielGruppenkeimInformationAbstraktionsebeneBeobachtungsstudieVariableHomepageStatistikInhalt <Mathematik>Mailing-ListeVorgehensmodellFehlermeldungMetrisches SystemStichprobeSchätzungSampling <Musik>UnendlichkeitPunktZellularer AutomatWurzel <Mathematik>DatenstrukturZufallszahlenFlächentheorieFlächeninhaltSoundverarbeitungRechter WinkelStichprobenumfangDatenstrukturProzess <Informatik>PunktComputeranimation
45:45
FehlermeldungInverser LimesSchätzungStichprobeGewichtungDichte <Physik>DatenmodellFlächeninhaltAutokorrelationsfunktionRechenschieberMailing-ListeStichprobenumfangResultantePunktSchätzfunktionRandomisierungKreuzvalidierungBeobachtungsstudiePhysikalische TheorieFlächeninhaltLogischer SchlussMapping <Computergraphik>Gewicht <Ausgleichsrechnung>VarianzMAPAutomatische IndexierungStrategisches SpielFehlermeldungNachbarschaft <Mathematik>Kappa-KoeffizientStellenringZahlenbereichKategorizitätDatenstrukturMereologieBitMatrizenrechnungZellularer AutomatProzess <Informatik>GamecontrollerInklusion <Mathematik>AuswahlaxiomDifferenteComputeranimation
51:59
TelekommunikationStatistikTextur-MappingDatenmodellBeobachtungsstudieAttributierte GrammatikStandardabweichungAutokorrelationsfunktionComputerspielProgrammierumgebungGruppenkeimInformationAbstraktionsebeneVariableHomepageMailing-ListeInhalt <Mathematik>Gerichtete MengeVorgehensmodellKlumpenstichprobeZufallszahlenFehlermeldungPhysikalische TheorieProzess <Informatik>Güte der AnpassungKreuzvalidierungStrategisches SpielCASE <Informatik>RandomisierungFlächeninhaltStatistikMapping <Computergraphik>PunktRechter WinkelÄußere Algebra eines ModulsComputeranimation
53:56
Dichte <Physik>GewichtungDatenmodellFlächeninhaltAutokorrelationsfunktionRechenschieberKlumpenstichprobeZufallszahlenFehlermeldungStichprobenumfangMAPEndliche ModelltheorieMomentenproblemCharakteristisches PolynomDatenstrukturKreuzvalidierungFlächeninhaltSchätzfunktionLateinisches QuadratURLAutorisierungPhysikalische TheorieBereichsschätzungTrennschärfe <Statistik>Kategorie <Mathematik>Logischer SchlussVariableStatistikStrategisches SpielComputeranimationTechnische Zeichnung
57:09
RechnernetzFehlermeldungMetrisches SystemStichprobeSchätzungSampling <Musik>UnendlichkeitPunktZellularer AutomatWurzel <Mathematik>ZufallszahlenStichprobenumfangPuffer <Netzplantechnik>SchätzfunktionEinsPlotterDatenstrukturComputeranimation
58:08
FehlermeldungInverser LimesSchätzungStichprobeHeegaard-ZerlegungTextur-MappingDatenmodellPrognoseverfahrenFlächeninhaltExtrapolationMereologieMerkmalsraumAutokorrelationsfunktionRechenschieberDichte <Physik>GewichtungSampling <Musik>StatistikBeobachtungsstudieStandardabweichungTelekommunikationComputerspielProgrammierumgebungGruppenkeimInformationAbstraktionsebeneVariableHomepageMailing-ListeGerichtete MengeInhalt <Mathematik>VorgehensmodellPhysikalische TheorieSingularität <Mathematik>KreuzvalidierungBitRechter WinkelComputeranimation
Transkript: Englisch(automatisch erzeugt)
00:05
Welcome to everybody watching this lecture. Thank you for taking the time. And we'll be talking about assessing map accuracy during this lecture. And there's a lot I want to tell you, so better get started. If you have questions, put them in the Mattermost chat and Hannah will collect them.
00:23
And hopefully we'll have time towards the end to answer them all and to address them. If not, I'm sure I will do that afterwards. OK, so design-based methods for assessing a map accuracy. And I'd like to start with a classical problem from statistics and probability theory.
00:45
Maybe you remember that from when you took your statistics classes. So we have a vase here with red and grey balls, let's say. And I'd like to know what is the percentage, the proportion of grey balls in this vase.
01:00
Well, what we could do is, of course, check them all and just count them. And then we would know exactly what's the proportion of the grey balls in this vase. But that's a lot of work. So what we usually would do, and I hope you agree with me, is we take a random sample from the vase. For example, we could take randomly 30 balls from this vase and count how many of them are grey.
01:24
And, well, if let's say 10 out of 30 are grey, then our estimate of the proportion of grey balls in the vase would simply be one over three, one third. And, well, that's the way we do it. And it will give us an unbiased estimate of the proportion of grey balls in the vase.
01:44
Now, suppose we have all grey balls on top in this vase and all the red balls in the bottom. Well, that really doesn't matter. If we do a random sampling of, let's say, 30 balls from this vase, then it still works because we take a random sample.
02:04
It really doesn't matter the structure of the balls within this vase, whether they are on top or in the bottom. Every ball has equal probability of being selected, no matter where they are. So it would still produce an unbiased estimate because we take a random sample.
02:21
And if we would not have a vase, but we would replace it with a two dimensional surface and we would distribute those balls over that two dimensional surface. And we would randomly select 30 of these balls and count how many of them are grey. And we would use that as an estimate of the proportion of grey balls for the whole surface.
02:43
It would still work because we take a random sample. And even though there is like a spatial structure in here and the white balls are mostly over here and here there are none of them. It really doesn't matter because we take a random sample. And if we take a random sample of 30 balls, maybe some of them will be really close to one another.
03:05
But that really doesn't matter. The theory is still valid of sampling theory. I'm trying to illustrate that now with you, with this example, but we'll come to talk about it a bit more later. So even if there is spatial structure, it doesn't matter because we take those balls, the sample randomly.
03:23
And you can imagine, you know, we are now talking about white balls and red balls, but could also be that every ball is a grid cell, you know, area of interest. And white means that a grey means, sorry, that the grid cell, the land use was not correctly classified.
03:41
And the red balls means a correct classification. So basically we can estimate the proportion of grid cells in a study area that are correctly classified by taking this random sample. And it will produce an unbiased estimate of the map accuracy. Well, correctly classified, yes or no, that's like a categorical variable, but we could also do this for continuous variables.
04:08
Like, for example, you notice that all those balls have, they're all a bit different. Some are big, some are small. So if we are interested in the average size of the balls in this population, in this two dimensional surface,
04:22
we could take a sample and use the average weight, average size of the sample as an estimate of the population of the whole area. Even if every ball would have a number written on it, like the map error or the sort of error in your map at that location,
04:41
or maybe the squared error of your map at that location, we could estimate the mean error and the mean squared error for the whole area using that random sample. Well, this was just like a teaser example showing you that probability sampling and
05:01
the associated design-based statistical inference also works when you sample a spatial area, like in this example, even if there is spatial structure. And that is one of the aims of this lecture. I want to show to you, demonstrate that probability sampling and design
05:21
-based statistical inference are a sound and solid methodology to assess map accuracy. So the example was just like a teaser. And we'll go into a bit more detail of that in the next slides. A second aim of this lecture is that I want to explore together with you,
05:41
how can we assess map accuracy in a case where we don't have a probability sample? So the aim one assumes, and that's the downside maybe of this probability sampling and design-based statistical inference, that you are able to collect a probability sample and maybe that's not always realistic.
06:01
So we also want to look at aim number two. But let's start with aim number one. Yeah, so we're talking about map accuracy, and then when we talk about map accuracy, of course, we need to quantify it, so we need to define it in some way. So today, I think we're only going to look at one metric, one measure of map accuracy, which is called the root mean squared error.
06:24
I think, you know, for continuous variables, that's probably the most common measure of map accuracy. But, you know, everything I'm going to tell you today could also work for the mean absolute error or the mean error, etc. So how do we define this root mean squared error?
06:44
Basically, we have a variable of interest, z, which is a function of location x. X is some location in our geographic domain D. So let that be our variable of interest, and let's call that the reality. But we don't know that reality. We have actually made a map of that reality, z hat.
07:05
And the map is available to us everywhere because that's what we've got. And we like to know how good, how accurate is that map, how close is it to the reality? So what we could do is visit all the locations in our study area.
07:22
So I discretized the geographic domain D into a very large but finite number of, let's say, grid cells. And I visit all those grid cells. I subtract the map estimate from the true value. And I look at the squared difference and I average those.
07:42
And maybe I take a square root of that to get the root mean squared error. I think you're all quite familiar with this. Now, and this is what I call the population root mean squared error, because it really visits all grid cells, all locations in the study area. So this is what we want to know, but this is usually what we don't have it, right?
08:03
Because in order to compute this population root mean squared error, we would have to know z at each and every location in our study area. And we usually don't have that. We will use a sample, a sample where at validation points lowercase n, where lowercase n is much,
08:24
much smaller than capital N, uppercase N, the whole population. And we hope that our sample root mean squared error is close to the population root mean squared error. Let's look at a synthetic example where, and check how all this works out if we use simple random sampling from the population.
08:48
This is the map that I've got. And for those of you who looked at the question, they should be quite familiar with this map, because that is actually the exercise that I posed you with. So you were asked to estimate the population root mean squared error for this case.
09:02
This is the map. I gave this map to you, but there was also a reality, a truth, which because it's a synthetic example that I had created. And from this reality, I had created a map. But you didn't know the reality. But if you know the reality, you can subtract the map from the reality. And this is then our map error.
09:24
And of course, we can also take the square of those map errors. And the goal is that we want to, you know, if we want to go for the root mean squared error, we want to know what is the spatial average of this squared error.
09:40
And that spatial average, the mean squared error is 7.305. And if we take the square root of that, we get the root mean squared error is 2.703. So for those of you who made a question, you know already now that this is the answer that we were looking for. And you were trying to get as close as possible to that from the, well, the data that were available to you.
10:02
We'll come to that at the end of the lecture. OK, so this is what we want to know. And now let's go to R and see, explore a little bit how it would work if we would use a simple random sample to estimate that root mean squared error. So let me just show this to you. OK, you know, I'm not really good at using R and I'm a bit old fashioned.
10:24
I'm still using SP rather than SF, so forgive me for that. OK, and let's plot that map of the errors. So takes a bit of time. But this is the error map and let us now sample randomly.
10:46
50 points from that area. OK, and let's plot those points on the error map.
11:01
OK, so every circle is one of the, the center of every circle is one of the points that were sampled, 50 points in total. It's a random sample from the population. We do an overlay. We take the square root of the squared errors and then our estimate is 2.878.
11:22
Which is not that far away from the true population root mean squared error. But it's only an estimate and we can do it again and take another sample. And then we will, of course, have different locations because we randomly take another 50.
11:43
OK, OK, now they're elsewhere and so we also get a different estimate. A little bit further away, you know, when you use a sample to estimate the population with mean squared error, you will always have some estimation error. Note, by the way, that some of those locations are really close to one another, right?
12:03
But that really has no effect. Doesn't matter. It's still a random sample from the population. So let us explore now if we not do it only once or twice, but we actually do this 1000 times. So 1000 times taking a sample of 50 points and estimating the population root mean squared error from the sample.
12:27
So that's what I do here in this for loop. And let me then plot a density of the 1000 estimates that I get.
12:42
OK, so this is the distribution of those 1000 root mean square sample values that are estimates of the true population root mean squared error, which is the dashed line over here. So actually, well, that's, you know, what I mentioned earlier today at the start of a lecture, we get an unbiased estimate.
13:05
So every sample root mean squared error is, you know, is not spot on with the population root mean squared error. Sometimes it's too high. Sometimes it's too low. But on average, we get it right. That's this idea of a root mean that it's being unbiased.
13:22
So the average of those 1000 is already very close to 2.703. And if I would do an infinite number times rather than 1000, it would be exactly equal. Now, what would happen, what would change in this distribution if we would not take 50 points as a sample, but we would take 100 random points from the area?
13:46
And that's where I need your input. So let's go to Mentimeter. And I'm going to ask you to submit your answer. So please answer this question. What is the effect of increasing the sample size from 50 to 100 and using
14:03
the average of 100 random points in the area to estimate the root mean squared error? And you have a choice of four possible answers. And I hope it all works. So you have to go to menti.com and use this code to submit your answer.
14:25
OK, good. We're getting we already have one response. Maybe copy that code to the chat. Yeah, of course. Let me see. How do I get to the chat? I don't know.
14:49
Don't seem to have access to the chat. Or maybe here.
15:02
But of course, everybody can see it also on the screen. I hope it's not too small. You just copy the numbers, I think. You didn't copy the URL. Oh, well, it's menti.com. OK, I can type it here.
15:25
OK, that's it. I need to pay attention to the clock. Time goes fast. I'm seeing some skew distribution.
15:40
Yeah. So the majority actually thinks there is no effect, whether I take a sample of 50 points or 100 points. It just doesn't really make a difference in terms of how accurate that distribution of the sample RMSE is.
16:02
How close is it to the population? But there is also there are also people who say, well, actually, it does have an effect. Well, this gets very little votes, but here this is about equal. Well, why don't we have a look? I'll leave it open. Maybe there will become more. So let's simply do it. OK, so now I take each time a sample of 1000 and I do 100 points.
16:26
And I do that also. Let's plot both distributions in one graph. So the blue one is the one we had with 50 and the green one is the one we have when we use a sample size of 100. And actually, you do see that it gets narrower or estimates becomes more accurate.
16:46
And I, you know, I think you should all agree that this makes sense because the larger your sample, the closer you are to what you want to know because you have more information. How much better did we get? Well, 70 percent.
17:03
So basically, we were able to reduce the spread of the distribution by 30 percent. So answer B was correct. OK, let's go back to the lecture. So well, maybe I'll just let me oh, wait a minute. So this was the correct answer.
17:23
So, well, it's actually a small group of people who got that right. The mean RMSE stays the same more or less. The population RMSE is constant, of course. People are confused with that. Yeah. OK, maybe, you know, you need to think about it a bit and then maybe it will become clear.
17:44
Maybe watch this lecture once again and then maybe. Well, let's continue. OK, so what have we learned from this example? Random sampling produces an unbiased estimate of the population RMSE, because the center of that distribution is centered on the population RMSE and the uncertainty of the estimate,
18:04
how close is it to the true population RMSE decreases when the sample size gets larger, gets bigger. So, of course, that's good. But actually, the decrease is inversely proportional to the square root of the sample size.
18:22
This is really something basic from statistics. And maybe you remember that from a course that you took a long time ago. And this is actually bad news, the square root of N, because it means that if we want to double the accuracy, we have to quadruple our sample size. So it would have been preferred if there would be an N here, but that's simply not the case.
18:45
OK. If our sample size is large enough, then we have the additional property from the central limit theorem, from statistics that that error, the estimation error is approximately normally distributed.
19:00
So we can compute confidence intervals, like, for example, the 95 percent confidence interval. And I have an example here from a PhD student of mine, Anatole Helfenstein, who took the summer school last year, by the way. This is the pH of the soil predicted in the Netherlands. And we used a probability sample to estimate the root mean square error of our maps for various depths.
19:27
So the black disc is the estimated root mean square error. And then those grey bars over here, they show the 95 percent confidence interval.
19:41
So we can be 95 percent confident that the true population RMSE is within these limits. So it's all quite nice, I would say. So the summary of the first aim of this lecture is that we can apply probability sampling and design based statistical inference to assess map accuracy.
20:04
There is no problem with the fact that actually some of those points are really close to one another because it's still a random sample from the population being the whole map. If you are not convinced, that may well be, then maybe check this paper.
20:22
Actually, I'll show that paper later. And, you know, I didn't share the PowerPoint, the presentation yet with you, but I will do so after the lecture. So then you can also click on this and it takes you directly to that paper. This is another paper. It's this paper by Dick Briss. You also see Dick Briss's name over here.
20:40
He is really a spatial sampling expert and he should have been given giving this lecture, but he retired and he said, well, why don't you take over? So that's why I am here now. But OK, anyway, so this goes into very much of the details of that. So if you're not convinced, you might want to check this. We get unbiased estimates and we can compute confidence intervals.
21:06
All really nice. Now, it's not only simple random sampling that you can use. You can also, for example, use stratified random sampling, cluster random sampling, systematic random sampling. But of course, there has to be a random element in there somehow. So it has to be a probability sample.
21:28
So Dick wrote a book, Spatial Sampling with R, and you can access it here. It will be released really soon and the PDF of that book is free for everyone. So I can really recommend that. OK, good. Aim two.
21:47
What do we do if we don't have a probability sample? And that, of course, happens quite often, right? So in practice, we usually cannot afford or maybe we don't think about it. Maybe we should reserve five percent of our budget for validation and taking a probability sample.
22:04
But if we didn't do that, we'll have to still want to estimate the map accuracy. And if we don't have a probability sample, we cannot use probability sampling design based statistical inference. So what do we do? Well, what we often do is maybe use data splitting, right?
22:22
We have a data set for mapping and we split it into part for calibration and prediction and another part for validation. Or we do cross validation. I think all of you will be quite familiar with that. Well, what's best? What should we do? So another poll, another bit of interaction with you. Let me see if I can go there now.
22:45
So which of these four methods would you prefer? So again, go to www.menti.com and use this access code. Copy this to the chat again. And wow, cast your vote.
23:23
OK. It's nice to see responses. And it seems that most of you prefer tenfold cross validation. Is that with refitting or without refitting? Whatever you like, Tom. Well, it's a very good question.
23:44
Well, of course, you fit your model with nine of the tenfolds, right? And then you refit, you refit ten times. Each time you fit again, you have to do it ten times. And here, you leave one out, you have to do it n times, right? Where n is the number of observations. So this is much more computationally, much more demanding, I guess.
24:05
Maybe that's the reason why people prefer this. OK, so, yeah, let's go back to the lecture. So I'll leave this open and see what the preferences are. And let me tell you what I prefer.
24:21
Let's see. Yeah, OK, I think that usually cross validation is better than data splitting, because data splitting very much depends on the split that you happen to make, right? And actually, if you do k-fold cross validation, and maybe we should talk about a little bit more the details, like Tom is interrupting me about how exactly, what do you mean by this?
24:44
But basically, you know, when you do k-fold cross validation, every observation is used once for validation. So you use the whole data set for validation. That's what I like. You don't get that when you do data splitting. And then I think, normally, I would prefer a little bit more leave-one-out cross validation over k-fold cross validation,
25:04
at least if you can afford it computationally. Because, you know, you would be closer to the final modeling that you would do using all the data. You just drop one observation when you evaluate how well did you do. Maybe also a little bit depends on the case. And, yeah, and there's one important problem with all these four methods that I offer to you,
25:27
is that when we do cross validation or data splitting in case of clustered data, and I really should refer here to Hannah's lecture from yesterday, where she made really, really clear, very well clear, that in case of clustered data, yesterday,
25:43
the minister was removed from the calibration data set, and you try to predict that minister, well, maybe that's even extreme form of clustering where you actually are extrapolating very much, then, yeah, there are some problems when you have very strongly clustered data and apply cross validation.
26:03
For example, if 90 percent of your data are in 10 percent of the area, then if you use a standard cross validation to estimate the map accuracy, then this will be 90 percent dominated by how well you do in 10 percent of the area.
26:22
And of course, that's that part of the area where you probably do much better because you have a large high sampling density in that area. So that would not be fair. You are too optimistic. Right. So we like to compensate for this effect.
26:41
The fact that standard cross validation and also data splitting randomly wouldn't do a good job, wouldn't provide an unbiased estimate and reliable estimate of the map accuracy in case of clustered data. And clustered data happen a lot. For example, here, well, I just went to the Internet basically and look at some examples.
27:04
This air quality network from the European Environmental Agency. So we have a lot of stations over here, but hardly any over here. And probably air quality here is pretty good. So if you use this sample to estimate also the errors in our maps, well, I don't know how that differs.
27:21
But, you know, if there's little spatial variation, you probably do better mapping over here than you do over here. So wouldn't be very fair to use random cross validation. And this is above ground biomass as an example. And for example, well, here in the tropics, but you have a lot of data from Mexico and this part of Central America,
27:44
but very few from Brazil and here Australia, hardly any. And well, so, you know, another strong case of clustering. And this is these are data that I work with myself, soil data distributed all over the world.
28:01
So there's a high sampling density in Europe and in the US. We have almost no data in this part of Australia, in Siberia, Sahara, Canada. So, yeah, cluster data occur quite a lot. So what can we do in case of cluster data?
28:21
And we have to do a cross validation. Well, what some solutions, a spatial cross validation that's often used there or proposed. I think you have to be very careful with that because it's starting from the wrong premise that spatial auto correlation should be avoided.
28:40
But I hope that I've made clear with the first aim of my lecture that actually, as long as you take, for example, you take a random sample, some points will be really close to one another. It's not a problem at all because you took a random sample. So spatial cross validation really avoids that. Make sure that never a validation point is close to calibration points.
29:04
And then it tends to be too pessimistic. Density weighted cross validation basically somehow attach weights to each validation points, depending on what's the density of the data in the local neighborhood. High density means lower weight.
29:22
We're working on that methodology. Model based cross validation is basically you build some kind of a geostatistical model of the error or of the squared error in your map. And then you interpolate those and then you take the average. Also, you know, I'm not going into the details of that has to be developed.
29:43
And maybe there were some suggestions from you, answers to the question that I had posed. Have a look at that in a minute. Hopefully, if we still have time for that. This is really a lively area of research. There's lots of developments, no perfect solution yet, as far as I can tell.
30:01
So I hope that you will be able to contribute. OK, I, you know, I recently published a short communication. It's only five pages in ecological modeling with Alexander of I do as the first author and also with Dick Briss. I already mentioned him before and she said the brain and we had a bit of a provoking title.
30:22
I think spatial cross validation is not the right way to evaluate map accuracy. So and we really, you know, advocate, make use of probability sampling, right? Design based statistical inference and the core. Well, we illustrate in this paper, we do have a case study.
30:44
Now, I'd just like to show you the main result. So, OK, we took an area in the Amazon and we needed to know the reality everywhere. So it's a bit of a synthetic case. We took an existing map of the above ground biomass as the reality.
31:03
And then we took samples from that reality and we interpolated using a random forest. And then we checked how well did we do with three cross validation methods and also for three types of sampling designs. So let's start with this light blue case.
31:23
And on the Y axis here, so you see all these box plots here. The Y axis is the error in the estimated root mean squared error. So we like it to be close to zero. So the blue one over here does a really good job. So if you have a systematic random sample, a regular sample, then standard random K fold cross validation actually does a pretty good job.
31:46
And it also does a pretty good job in case of a simple random sample. But when you have a clustered sample, it's simply much too optimistic, right? So the thing we said before, so it's not doing a good job.
32:03
Standard K fold cross validation when you have a clustered data set. Now, these two, the green and the dark blue, they are both versions of spatial cross validation. And actually, in our case, they did bad jobs in all cases, especially the buffer leave one out cross validation method.
32:25
So they are too pessimistic. OK, especially this one over here. But even in case of a simple random sample, they just tend to be too pessimistic about the root mean squared error. I also put the red one in over here.
32:42
This is when you take a simple random sample from your study area. So not systematic, random, not simple. Well, actually like the simple random sample. So an independent data set using probability sampling and what we usually cannot afford. But just to confirm that the red is doing a really good job in all those cases.
33:03
Centered on zero with a pretty small standard deviation. And all these cases use the same number of observations to estimate the population root mean squared error. OK, if you want to know more about this, I have put the link to this paper in the presentation.
33:23
OK, we have 10 more minutes. For the spatial, which method you use the spatial cross validation? OK, so you read the paper. OK, Tom. And basically we're using the methods developed by Plauton et al.
33:42
But do you know, is it a spatial block? Tom, put your question in the chat and I'll answer it. We have time until quarter past, so we have a bit more time. Oh, wow. OK. Yeah, that's good. So we actually do have time. Yeah, I just had four o'clock Dutch time in my mind.
34:01
Oh, good. Good. So in that case, Tom, so to answer your question, because I was a bit blunt. You know, spatial K fold cross validation buffer leave one out cross validation. Yeah, they are I think they're all variants of what in general is called spatial cross validation. So what do you do here? You have your complete data set.
34:22
You make K folds and you make sure that those folds are, you know, geometrically compact. So you have K compact clusters and you remove one cluster and use all the others, other folds to estimate at that left out cluster.
34:44
So basically you're doing a bit of extrapolation. It's pretty much the same as what Hannah showed yesterday. You leave out Munster and use the rest of the area to predict at Munster. OK, then you don't do a good job because there's about three methods to do spatial cross validation.
35:02
Yeah, so I'm I am referring here to what we used in that paper. And our paper uses a method that the methods used by plot on at all is in the references of our paper. And they had two spatial cross validation methods. And I'm trying to explain to you what we view, what what these are. So this one is where you make you stratify and you define your folds by using compact geographic clusters.
35:29
So not random folds, but spatially compact. And this buffered leave one out cross validation is where you make sure you have a buffer around every validation point.
35:44
And the buffer has a certain radius and the radius could be derived from the range of the semivariogram. So it's supposedly to have a lot of theory behind. But actually, well, we don't think that that it is fair because, you know, whenever you do a validation of a map,
36:03
you know, the map predicts at points using calibration data that can also be nearby. So basically, when you do buffered leave one out cross validation, you make sure that you never have a calibration point close to the validation point. It has to be far away because there's a buffer around it.
36:22
And that's why it becomes very pessimistic because it's actually always evaluating how well do I do if I extrapolate in geographic space rather than interpolate. OK, that's all I want to say about that now, but the details are all specified in the papers that I referred to.
36:46
OK, right. So let's say we have plenty of time. You might say, well, OK, maybe we have like 10 minutes to discuss and then we also have still a lot of time for questions. You have a lot of time, yes. Yeah, the map accuracy question. So I think I posted that on Tuesday or Monday or Tuesday.
37:06
And this was asking you, can you estimate the population mean error and the population root mean square error for a study area, which is a square study area?
37:21
Well, we had a map of that, but you didn't know the reality. But what I did give you is for 1000 points in that study area. Well, the geographic coordinates, the X and the Y and the error. And so what we want to know is the mean error and the root mean square error. But we want to know it for the whole study area.
37:46
So I think the study area was 500 rows by 500 columns. So that's 250,000 grid cells, right? So this is defined for capital N equal to 250,000, because that is what we want to know.
38:04
How well did we do for the whole study area? And you were asked to estimate the ME and the RMSE using a sample of only 1000 points. And these were given to you.
38:21
Yeah, and if you look at this sample design, I think everybody notices that there are some cluster data, right? It's not a random sample. There is some clustering there. And we know now that clustering is risky. So because if we would just take the naive default unweighted estimate of ME and RMSE, replacing capital N here,
38:50
uppercase N by the lowercase n and just taking the average of all those 1000 cases, then I think we should be able to do better because we would give a lot of weight to those areas where we have a high sampling density.
39:06
So we would like to downgrade their influence and maybe pay more weight, give more weight to points that are not in highly dense areas. OK, so that would be, but then there is also the model based kind of approach.
39:24
So I got three submissions from participants of the course. And so let's have a look at the answers that I got. So the true mean error is 0.6615 and the true root mean square error is 2.7027.
39:42
So I got estimates of both of them and I simply took the absolute difference between the true and estimated and added up those two absolute differences. And that gave us the criterion because that's also what the exercise said over here.
40:02
The contribution that has the smallest sum of the absolute ME and RMSE estimation errors is the winner. So let's see what we got. So if you would use an unweighted, naive kind of just let every observation is equally important. Well, you would get a criterion of our value of 0.2032.
40:27
Now, I got a submission from Carlos from Abdelkrim and Olivier. Well, Carlos, I really liked your approach because you took this model based approach where you said let's build some
40:42
kind of a Kriging model, a geosatistical model of the error and the squared error and let's interpolate that and take an average for the whole area. But you did actually worse than the unweighted case. I don't know exactly why. I didn't look into the details, but I liked your approach.
41:02
But in the end, your answers, your outcome was not as good as taking the unweighted case. Abdelkrim, well, Abdelkrim, you simply used the unweighted approach. You took the mean error and the root mean square error of the sample. And so you got exactly the same results as I had for unweighted and you got exactly the same criterion.
41:25
And then Olivier, I think Olivier submitted his answer this morning and also used a model based approach, also quite good. And also you checked and I think Carlos did that too, whether there were duplicates. And as it happened, I didn't do that on purpose, but actually in the data set over here, there were actually seven or nine locations duplicates.
41:49
You don't want to count them twice, I think. So you took care of that. But somehow something went wrong, I think, in your code because you got an extremely high criterion value.
42:01
So I think the winner is Abdelkrim. Although I like Carlos and Olivier's methods more, in the end, the criterion is the lowest for Abdelkrim. So we also had a few out of competition submissions from Etzer and also from Sitzer de Bruijn. So I put them in grey here. I hope you can still read them because they don't really, you know, they're not really participants of the summer school.
42:28
So and Etzer took both a design based, a very clever way by, you know, he looked at the assumption, well, there are clusters clearly. So let's identify the clusters and let's pay. So basically it's a kind of a density weighted approach.
42:47
But also something must have gone wrong, I think, Etzer, in your calculations, because the RMSEs are very much too low. So that's why your criterion values are very high. And Sitzer, you know, I mentioned him because he's one of the co-authors of this ecological modelling paper, short communication.
43:09
Yeah, he did better than the unweighted case, but not very much better. So, in fact, this model based approach similar to what Carlos did and also Olivier and also Etzer, but maybe Sitzer took more time.
43:25
He had also more time. I had shared with him the exercise already a few weeks ago, so he got the best. But they're out of competition. So I think for me, Abdelkrin is the winner. OK, that's it. Thank you. And OK, now we still have like 15 minutes for questions, so I'm happy to answer any question.
43:46
Thanks a lot, Gerard. And I think it's good that we have 15 minutes left because quite some questions came up on MetaMost. Probably we start here at the beginning. So the first question is, why the simple random sample will give an unbiased point estimate if there is a spatial structure.
44:06
Will a local neighbourhood variance estimator be better than a spatial random sample variance estimate? OK, that's a long question. So basically, I think this question has several components, but the first is why does simple random sampling give maybe should I stop sharing?
44:26
No. Yeah, I think it's all right. Keep sharing. Keep sharing. Keep sharing because we can look at the slides also. And should I go to let me go to the slide. You know, maybe this introductory example.
44:42
I'm not sure if this is the right slide, but basically, you know, so, you know, we see spatial structure here, right? Let that grey be wrong classified and red be correctly classified. But we take a random sample. So spatial structure has no effect because every point has every of these balls has equal probability of being selected.
45:07
So the grey balls have equal probability of being selected as the red balls. So if I want to estimate the proportion of grey balls of balls that is grey in this population in the study area. If I take a random sample, that random sample will give me an unbiased estimate, no matter whether there is spatial structure.
45:27
Yes or no. I'm not saying that it's the best you can do, right? So maybe there are ways of doing a better job by taking the spatial structure into account. And maybe that is something we discussed later on in the second aim of this talk, like can we do a better job?
45:47
And maybe I hear I had a list. Yeah, maybe that could be this slide. So we want to estimate the population map accuracy.
46:04
Well, if we would have a random sample, we would. The nice thing about that is that we have theory that tells us our estimate will be unbiased and we also can quantify the estimation error. And I also mentioned that, you know, simple random sampling is not optimal, right?
46:22
Maybe if you have spatial structure, you would rather use systematic random sampling. Or I think there's also like a compromise between design-based and model-based. So you can have some, I don't know, I forgot the name now, but Dick will know. So there's lots of ways how you can do better.
46:42
But it's all still based on design-based statistical inference. Model-assisted. Yeah. Model-assisted. Model-assisted. Thank you, Edsø. Yeah, model-assisted inference. So there's lots of theory out there that can help you to do a better job than just simple random sampling.
47:01
But the nice thing is that the theory stands and you get unbiased estimates and you can also quantify your uncertainty. Now, Hannah, what was the question again? While the simple random sample will give an unbiased point estimate, if there is a spatial structure, will a local neighborhood variance estimator be better than a simple random sample variance estimate?
47:24
So the first part of the question, I think I answered that with the simple random sample, you get an unbiased estimate. And well, I did the best I could to convince you. If not, we'll have to continue later on. Or maybe you read some of the literature that I provided. It's simply because you take a random sample.
47:40
Now, if you make use of this spatial structure somehow to do a better job, because you know that those errors are spatially, well, you know, structured. And that's actually also what you do when you do this model-based approach. And that's for sure something you should be doing when you don't have a probability sample. Then probably you can do better.
48:01
And so that the methods could be like model-based or maybe density weighted. You know, I have a problem with the spatial cross-validation because it's very hard to make choices there. You quickly overdo it. So we know that random cross-validation will produce too optimistic estimates of the map accuracy.
48:22
But the spatial cross-validation, you know, it can get really out of control and you basically get too much too pessimistic estimates. And there's no way of controlling that. So I'm not a fan of spatial cross-validation. Okay, well, that's probably the best I could do answering this question.
48:41
Are there any more questions? Oh, quite a few. So the next one is less complex, I think. Do you have a recommendation which metric to use for categorical maps? Ah, yeah. Well, the most common one I would say is the map purity. So let's say the proportion of area, the proportion of cells that are correctly classified, right?
49:04
So it's just some number between zero and one. And I think it was Johanna who mentioned the kappa index yesterday or, you know, that's also something often used in the remote so that you compensate a little bit for the random chance of getting it right.
49:22
You have these confusion matrices, right? An error matrices and you have the consumer's accuracy and the producer's accuracy. So there's a lot. But if you have to use one, one number, only one metric, only I would go for the proportion of points that are correctly classified.
49:42
Okay, so the next is rather common, but maybe you want to comment on that comment. I'm a bit confused now about deciding between the random sampling approach and the spatial cross-validation approach that I suggested yesterday. I guess I need to relook at all the papers and think a bit more.
50:01
Yeah, and maybe I didn't do the best job I could explaining the difference. So, you know, this design-based probability sampling kind of inference is really based on statistical sampling theory. And it only works if you have a probability sample from your area of interest. And probability sample means that every location in your study area has a
50:24
probability of being selected and that you know those what they call inclusion probabilities. So you really have to do that in a proper sound way. So you have you delineate your study area and you have a shapefile or whatever, and then you randomly or use or maybe stratified randomly.
50:43
Well, all those kind of probability designs that you could use, you have to use probability sampling to select those locations. And only then can you use design-based inference. So and then the second aim of the lecture was about, you know, suppose you don't have it, which we often have in practice, then we cannot use design-based.
51:03
I know that some people do. They say, let's just pretend that our data are a random sample from the study area. But I don't like that because that's not true. You often have just a legacy data set, a convenience sample, which may be clustered. So it's not a random sample. And then we still want to estimate the map accuracy.
51:24
And here are some methods that can do the job, but they all have pros and cons. So really, we need to work on this as a community, I think. OK. OK, so next, I squeezed in a question from my side, keeping in mind that
51:44
most data used for spatial mapping, at least those used for global mapping are heavily clustered. And following the results of your paper, why isn't the title of your paper commonly used random and spatial cross-validation strategies are not the right way to estimate map accuracy?
52:01
So commonly, you want to put commonly used in front of this title? So the commonly used random and spatial cross-validation strategies. Yeah, yeah, yeah. OK. Spatial cross-validation strategy is not. I think I agree with you there that this shows us that random cross-validation doesn't do a good job in case of clustered data, right?
52:31
So the title says spatial cross-validation is not the right way, but random is also not the right way in case of clustered data.
52:43
So I agree with you there. But I think, you know, and it's a very good point because many people do that. And I've done that myself, too. And Tom has done it also. I remember, you know, when we did the soil grids mapping in 2017, he said these are our cross-validation statistics. But then the data were very clustered.
53:02
So probably those RMSEs, they were far too optimistic, much too low, not really representative of the whole study area. But, you know, what we wanted in this paper, in this short communication, to really point to the fact that many people say spatial cross-validation is the solution.
53:20
And we don't believe that, right? And we have here, well, this is a case study, an illustration that you can get strong biases when you have spatial cross-validation. So that was the thing we wanted to say. But I fully agree with you that standard, common, random cross-validation also will not be doing a good job in case of clustered data.
53:48
So maybe we need to come up with an alternative cross-validation approach then. That's why I wrote this, right? Yeah, yeah, we're not there yet.
54:02
I like this one, by the way. So we're working on a paper doing this. But this is also, you know, but yeah, so there's a lot we can do still and we need to do it. Okay, so there's another question based on your excellent talk. I would conclude that for the moment the only accepted and validated method to map accuracy
54:26
for map accuracy assessment of clustered data is taking a probability sample using the design-based approach. Would you agree with that? Yeah, I agree with that. Yeah. So if you are in a situation where you calibrated a model to make a map with this kind of
54:43
data and you make your map and you want to, you know, you want to evaluate how well did I do? Then the best way would be to take a probability sample from your study area. Now, this is the Amazon and I haven't been there.
55:02
I think Alexander was there and he saw the first author of the paper and he did, you know, really an excursion walking in the wilderness and we have many people from Latin America, from Brazil. I mean, the accessibility is really a problem here, right? So if I generate 500 locations and I say go there and check what's the true biomass at those locations, that may be a costly affair.
55:27
So that's, of course, also why so often we have clustered data when we calibrate models. But this is a risky situation to somehow evaluate the map accuracy using the existing cross-validation strategies.
55:44
And maybe the weighted one or the model-based one will do a better job. But yeah, in all cases, I think probability sampling, design-based statistical inference is the best. If you can afford it, because also it has theory backing it up, you can simply prove that the
56:06
estimate of the map accuracy is unbiased and we can quantify the accuracy of those estimates with confidence intervals. So it has so many nice properties.
56:20
We have two minutes left and there's another question. I think Tom also has another question. I quickly read the last question here from Matamos. Does the selection between random sampling or spatial cross-validation not depend on the characteristics of the variables we are trying to predict, for example, if it's an ecological or biophysical variable that we want to map?
56:44
So I think maybe spatial cross-validation or all the cross-validation methods that I mentioned here, they perform worse or better depending on the spatial structure and all that. But design-based probability sampling doesn't care at all what is the spatial structure.
57:04
Like I tried to explain in the starting example, if we have this over here. Okay, I have to go all the way back. If this is our population, half of the balls, the gray ones are on top of the red one. If we take a random sample, we still get an unbiased estimate.
57:24
So no matter what the structure is, it makes no assumptions whatsoever about the spatial structure. It always works because the sampling is random from the population. All right, then one minute left, Tom. Okay, Tom, maybe you have to unmute.
57:44
Yes, I agree that if you like strictly put these buffers and you can, I mean there's I think Patrick Schatz made this plot where the accuracy drops as you go more and more distant. So, so I agree about that it becomes too pessimistic.
58:02
But I don't know, I think maybe the title of your paper. I mean your note. It might now tell people like anybody who applies spatial cross validation will say, well, I can reject your paper because these people said you shouldn't use it. Right. So it becomes a bit.
58:22
Yeah, it becomes a bit a strong statement. I think Tom, this is this is science, right? So we are in a debate. We are in a discussion. I'm trying to figuring things out. So I really welcome that. I like that. So if the others can prove that there's something wrong here and our or the actually spatial cross is not so bad after all, or, you know, let's let's let's enter this discussion.