We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Mapping Land Use and Land Cover: The Challenges of Going Glocal

00:00

Formal Metadata

Title
Mapping Land Use and Land Cover: The Challenges of Going Glocal
Title of Series
Number of Parts
9
Author
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Producer
Production PlaceWageningen

Content Metadata

Subject Area
Genre
Abstract
Gilberto Câmara is a researcher on Geoinformatics, GIScience, Spatial Data Science, Land Use Change. and Earth Observation, working in the Image Processing Division of INPE, Brazil's National Institute for Space Research. At the OEMC project kick-off, he discussed the challenges of mapping Land Use and Land Cover in Brazil, exploring the barriers of producing global data products that are helpful for local decision-making.
Keywords
Computer programLucas sequenceLocal ringWordOpen setComputer animation
RainforestSystem callTelecommunicationRainforestComputer animation
State observerTime seriesCondition numberComputer animationDiagram
ForestCASE <Informatik>Bit rateRainforestTelecommunicationComputer animation
Network topologyForestForestBasis <Mathematik>Insertion lossComplete metric spaceCovering space
ForestNetwork topologyWordForestFilm editingAreaMotion captureSinc functionBounded variationInsertion lossCovering spaceNetwork topologyDiagram
ForestPopulation densityNetwork topologyVacuumStreaming mediaConsistencyMappingLevel (video gaming)Local ringForestComputer animation
Natural numberInsertion lossNatural numberSource codeAreaPerspective (visual)Level (video gaming)
State of matterThermal expansionMathematical analysisOnline helpThermal expansionCovering spaceComputer animation
Open setNetwork topologyRaster graphicsWordCovering spaceAreaTerm (mathematics)Level (video gaming)HypothesisPetaelektronenvoltbereichSocial classNatural numberWordInformationComputer animation
Gamma functionPermanentNatural numberWater vaporCovering spaceRight angleSet (mathematics)Type theoryWave packetComputer animation
Video gameFormal language
WordFormal languageCategory of beingArithmetic meanTheoryWordFormal languageMatching (graph theory)
AreaWordComputer-generated imageryCategory of beingObject (grammar)WordCategory of beingObject (grammar)Real numberSocial classCuboidAnalytic continuationSatelliteGodNumberGoodness of fitComputer animation
Natural numberForestData miningAreaCovering spaceContinuum hypothesisMetropolitan area network
Computer-generated imageryGradientPopulation densityCategory of beingContinuum hypothesisGradientComputer animation
Computer-generated imageryGreatest elementState observerProduct (business)Power (physics)Level (video gaming)CubeLogicMachine visionComputer animation
ForestLucas sequenceComputerDigital signalAlgorithmComputer-generated imageryState of matterProcess (computing)IterationCubeUniform resource locatorSupervised learningCloud computingDimensional analysisTime seriesMappingInformationSpacetimePoint (geometry)Endliche ModelltheorieRight angleDecision theoryGradientConsistencyAlgorithmSampling (statistics)Green's functionSocial classCovering spaceReduction of orderMusical ensembleExpert systemLoop (music)Set (mathematics)Scripting languageMereologyDegree (graph theory)CalculationComputer animation
ForestAreaMereologyAreaPulse (signal processing)Roundness (object)Slide rulePhysical systemWater vaporInformationPoint (geometry)Social classType theoryForestComputer animation
Data modelForestRelational databaseHierarchySocial classEndliche ModelltheorieComputer-generated imageryDerivation (linguistics)MaizeForestRandomizationSocial classProper mapNetwork topologyDecision theoryArithmetic progressionPhysical systemResultantPoint (geometry)Endliche ModelltheorieSet (mathematics)Military baseMusical ensembleMultiplication signTheory of relativityMereologyTransformation (genetics)Image resolutionComputer animation
Sampling (statistics)Standard deviationComputer animation
Computer programVariable (mathematics)ForestDevice driverTorusPredictionType theorySpeciesInformationPoint (geometry)Linear regressionGoodness of fitVotingNumberForestMaizeFlow separationPoint (geometry)Set (mathematics)RandomizationResultantCubeConsistencyInternet forumWave packetVariable (mathematics)AdditionSelectivity (electronic)1 (number)Mathematical analysisExistenceNetwork topologyDependent and independent variablesWindows RegistryCovering spaceMappingWordMoment (mathematics)Ocean currentProduct (business)Cycle (graph theory)InformationSocial classComputer animation
Transcript: English(auto-generated)
Thank you so much, and I would like to thank Tom and the whole team at OpenGeoHub and of course express my honor to speak after Jana. I mean, Jana is a friend but also a person who has managing geosecurity in a fantastic
way and I really urge you to join activities with the geosecurity and I know in this at the end we have Byron Noor today, so this is a real, real, for me it's an honor to talk after Jana. Jana, and by the way, just to clarify, I mean it's absolutely, given that the Open Data
Cube is basically an X-ray, it is clearly, Jana is correct, possible to run the Open Data Cube as a back-end to OpenGeo, I mean of course technically speaking the details
go after, but let me just go back to my presentation, so you see a word which is not your everyday word, it's like the French say, there's a pourmanteau, pourmanteau, so it's the gluttonation of global and local and you see that, but before that, first of all, there is a strange person here in front of you, it's called a user, a user because
you know me somehow as a developer, but perhaps my most important role in the society is being a user of data and a policy decision, but the user also has users, you know, so of
course this is a user, Pope Francis, so Pope Francis actually said you need to protect the Amazon rainforest and I happen to be a classmate in engineering of all places in electronics engineering of a person that graduated with us and went to become a bishop
and he's one of the most important bishops in Brazil, Don Dimas, so I asked Don Dimas, how does Pope Francis know the Amazon needs protection and Don Dimas said, of course, it's your data, Gilberto, which means there are hundreds of millions of questions for belief in the Pope, right, but I'm sure there are not so many millions of people
that Pope believes in them and I'm one of them, right, so the Pope believes in our data and why does the Pope believe in our data? Well, the Pope believes in our data
for two important reasons, first it is transparent, first of all, it's the longest running Earth observation time series of anything anywhere in the globe, consistent over 34 years since 1988 and not only is it transparent, but it's authoritative in the sense that
it is used by red funds, so the Norwegian government and the German government have given, now they withheld the money, but let's explain why they withheld the money, they had given 1.3 billion dollars to Brazil on the condition that deforestation goes down,
now which data would the Norwegian government use to measure if deforestation is going down and paid to Brazil, our data, so our data is worth some billions of dollars and
never questioned our data, the data is used by Brazil's NDC and it's more than a thousand papers, okay, that's the interesting story, but 2018 comes a peace in the Guardian and I didn't make this up with Gert, but it looks like we had the telecommunication
by our hands because the Guardian says there is a vast expense of rain forest lost and it's triple the official rate from EP and that's the data from guest Global Forest Watch
and then I call my transfer, come on Matt, this can't be true, what's happening? And then back and forth and after some, you know, sometimes I can be rather nasty, not on purpose, but in this case it was, what happens? We define in Brazil for 36 years,
34 years, deforestation clearly, it's the complete removal of primary forest cover, okay, clean. Global Forest Watch as you have seen the famous 36 examples, it's three cover loss
measured from here to the other, one thing is not the other, you cannot claim on the basis of three cover loss that you're measuring the removal of primary forest cover, but in the end it was what the Guardian, the New York Times, Le Monde said, so after
some nasty exchange of emails, finally Global Forest Watch puts a disclaimer on their website saying Prodis, which is the Brazilian system, focuses on large clear cutting of primary forest in the Amazon while UND data captures loss in all three cover including loss in
secondary forest, what does it mean? This means in clear English double counting, why double counting? Because if you capture primary forest loss, okay, once it's lost, it's lost, 100 years to get back to biodiversity, lots of emissions, if you count removal from
secondary forest, you're counting the same area of trees which have been removed before completely, so you see this great variation here, now don't misunderstand and since I'm
here, I share your belief that global maps are important, but they need to understand to be understood, they're self-consistent, let me understand, you have to be careful about they're self-consistent, they're consistent with themselves, so when I talked
about global, what does global mean? A map can, are the maps globally trustworthy? Do they deliver what they claim to deliver? That's one thing. The second thing, are they
locally relevant? Which is a completely different question and in this case, it's yours truly with President Dilma Rousseff and the environmental minister, that's our, by the way, another set of users, we're discussing Brazil's NDC into a 2015 based on the data. Of course,
for the president of Brazil, you have to present the data which is locally relevant for the president of Brazil to decide what she is going to present in the Paris Agreement and that's not Global Forest Watch, I can tell you that. So the question of local
comes, it's very important and I'm going to argue from a land use perspective that the critical distinction that any land use map needs to make, it's this distinction between what is a natural landscape, what is a man-made landscape and of course, these things vary
but in some cases, it's quite clear. The Brazilian Cerrado, which is an area of two million square kilometers, it is one of the world's biodiversity hotspots and it looks like the left side, that's a natural Cerrado. Now, the right pasture counts one
of the major sources of deforestation and loss of natural vegetation in Brazil. So if I am doing public policy, it's absolutely necessary for me to distinguish what is natural, what is artificial because for example, this is the work, which is here, help to do land
use and land cover maps where we separate it, natural from man-made and based on that, we could and other people do as well, it's not only us, do policy analysis. What's
the impact of the public policies and soybean and pasture expansion in Mato Grosso, which is one of the world's breadbaskets, it's one of the places where it produces more soy in the world and more meat in the world and of course, pasture expansion means that you map pasture. Now, let's take another one. Is it world
government? Again, I'm going to argue it is self-consistent. Is it locally relevant? Bang, let's see. On the top, same area in the Cerrado is a world government. On the
bottom, recent map on the Cerrado that Hov did in his PhD thesis. What you can see, some classes match the natural hard greens, which is the woody savannas. Now, everything in yellow in the world government is called grasslands. If you look at the Brazil map,
one side of that river, this side, has been occupied by pasture while the left side is mostly natural Cerrado vegetation. The green areas is Cerrado and then open Cerrado,
which is the real open area. So, there you have it. It's not that the world government is wrong. It is self-consistent in its own terms. So, sometimes Tom was complaining
that I said, oh, this is useless. Maybe I misplaced the word. Useless in the sense that for a user in Brazil, it is not relevant to me. It does not provide the information
I need. So, now, let's take one bad example from a good place. Radio Earth is actually a very nice place. It's actually a geo-associate and they have very nice data sets, training data sets for Africa. They decided to go one step further and then things got in trouble.
Why? Because they decided to do a comprehensive legend for all their data sets for Africa. They have the data sets for Mali, for Kenya, and of course, in Mali, they have certain types, Kenya certain types, which is all fine. Now, they come up with six
cover types. Water, artificial, bare ground natural, snow ice permanent, woody woody, non-woody cultivated, and non-woody semi-natural. Can someone explain what is a semi-natural
landscape? Right. Now, question. Does it have to be hard? Yes. And that's the curse of Babel. It's in the Bible. The Lord confused the language of all the Earth, Genesis 11.9.
You don't have to be a Christian to believe in the curse of Babel, because it's among us. We are, Tom, you will live your life bound to the curse of Babel. You may try
Esperanto, but I don't think Esperanto will work. Okay. Now, welcome to the team. The only comfort I have is that people much smarter than me have looked at the problem
and did not find a solution. There's a long tradition going back to Plato. Plato was an optimist who believed in the theory of forms, and Plato said the words implicitly describe the properties of their reference. This is a very optimistic assessment, because
it says, well, there's a word, table, there's a reference, this thing here, and Plato believed that the word described the reference. And then, if you go to Frege, Uber, Zinnen, Bedoytung, Frege distinguished between the meaning and the denotation.
It is a long philosophical tradition, who, of course, adds up to Wittgenstein, who, in his philosophical investigation, says the meaning of a word is its use in the language. Wittgenstein was completely radical in regards to Plato and says meaning is what
the use we make out of it. Now, don't get nervous if you cannot solve the conundrum, for example, the radiant earth tried to solve, or if your deforestation doesn't match my deforestation. You know, Plato couldn't solve it. Frege said it's hard, and Wittgenstein
said it's even harder. If you think of what Wittgenstein is saying, he's saying give up on that. The meaning is the use. I'm not smarter than him, neither am Plato.
So our real problem in land use is that we have words, and we use words to refer to objects in the world. Right, we say urban area, agriculture, forest, savannah. But, in fact, the problem really is that the properties in the world are not binary.
They are a gradation. And therefore, the gradation of properties in the world means that we have a small number of detectable categories, especially more if we're using
Earth's observation, because we're limited, we don't have the plots, we're limited to what the satellite can say, so we say oh, ten classes is the most we can do, and we're trying to map the whole continuity in the world into these small boxes. It
is not going to work, so don't be frustrated. You're in good company of Plato and Wittgenstein, so at least have, and of course, God that said that you were cursed by people. One more example, if you're not convinced yet, this is Serrato in Brazil, Brazilian savannahs.
It's similar to the African Serrato. It's only that the megafauna that existed here was not, I mean, when man started in Africa, the megafauna was there, and there's some megafauna there, the elephants are there. There were megafauna in Brazil when the people came from the north 14,000 years ago, they killed all the megafauna, so we
just have the Serrato. Now, what is savannah? It is actually a continuum. If you think about gradient vegetation, height, and biomass, you see that you have a continuum ranging to, you know, wooded savannah, campo serrato, and neither of this,
but, you know, if you look the map, the world cover, it's actually more or less here, it puts this on the same category as a forest, so it's a wooded savannah would be the same as the Amazon forest, which it's not, and then the rest is grasslands, including pasture.
Well, welcome to the problem. It's unsolvable, okay? There's no solution. Now, there are different approaches to things, so given that the problem is hard, there are different solutions. In our case, my case, and the team I work, we have a vision,
which of course has to be completely different from the European vision. Our logic is the logic of empowerment, especially empowerment of people in developing countries. We don't have the resources the Europeans have, but we know what we have to do. We have to empower
end users, and we have to let the user do their job. That's why, on this whole data cube exercise, we opted for bottom-up map production, not top-down map production. Our approach was, I'm going to give the tool to the user for him to do the work.
So it has to be as simple as possible, but it has to encompass the full power of Earth's observation. This means give users all the data. This means work with time series. Don't work with
annual composites, best composites, seasoned composites. Don't assume that you know better than the other guy. Give him access to the data, let him decide if you want to
do an annual composite, a monthly composite, a 15-day composite. Don't decide on his behalf. You don't know much more than him. He probably knows more than you. So we built, we've been building since I was in Munster with Adzer, and Adzer is
also a culprit for making me go to R. Adzer now has some misgivings about that, but you cannot get away from your past. You know, your past follows you. So we developed an end user tool for cloud services. So it's self-contained. It
does not rely on anything else than itself. And because of Stack, it works on Digital Earth Africa, on AWS, on your computer, on Microsoft, on the Brazil Data Cube, on SIPO, on Tom's machine, wherever. And from Tom's machine, you can use this to get data from
Amazon or get that data from Digital Earth Africa. It just takes a little bit longer, but you get the same script runs as you would run in AWS. That's what the end user tool. And now it's TRL8. And we also learned from the gurus of R that
like Adzer, that any API should be short. So we have a, this is the essence, this is what we teach people from the earth science, our community earth science experts. So we tell them, look, there's something called a data cube, and you create it with SIDS cube. There is something called ground tooth with your samples, and you get the
samples from the cubes if you have the locations. There's something called a model, and you train the model with your samples. There's something called a regular cube if you want to produce 15 day cubes, one month cubes, and you regularize it. And there's something called a probabilistic cube that you get from classification.
You can smooth the cube or classify. That's all you have to learn. Eight commands and you're on. So we have biologists, agronomists. It's not designed for programmers. Right. But what we've learned in the meantime is that humans
have to be in the loop. And that applies not only to our package, to any package for land use cover. That's why I'm sharing this information with you. One tricky thing about big data is that, for example, these are 50,000 data points
from the Cerado, and they are four bands into a 15 day modus, which is 24 for a year. So you make the calculation. This is a 96 dimension space. How do you transmit the information from a 96 dimensional space for a user about the consistency of his data
or her data? We use self organized maps. Those of you who know what self organized maps know, you know that the self organized maps is a dimensionality reduction technique. And what happens here is that the neurons cluster, it's a very efficient cluster,
because not only does it cluster, but it contains spatial information. You see, these are, all these greens are from a single class. The fact that they are together tells
you something about the consistency of your samples. And the fact that you have, you know, this red guy here down, which is separated from his fellows, which are up there, that rings a bell. It's either a sample which is collected from a completely different
place where the others, because the Cerado has a, from five degrees south to 25 south, so that may be a gradient there, or it's an error. It's an outlier. You don't know. The algorithm has no way of knowing. It just tells the user, look at that data. The other thing we learned is that the user needs to be on the loop. What does
it mean to be on the loop? It means that most classification systems, including Google Earth Engine or many of those, are what we call passive learning. So what is passive learning? You have a labeled data set. You have a supervised classifier. You produce
a map. What is human in the loop? The classification is part of a loop. So you take the supervised classification and you say, look, there's some places here you need to improve. Give me some samples. Okay? Then classify it again. Look at the classification.
Improve the samples. Look at one example. This is, again, Rondonia, Brazil. We started, so I taught two Ph.D., two postdocs in Ecology, and I told them, start with the
classes that you're completely sure with. So they said, oh, we're completely sure that vegetation remains, pasture and burned area, and forests exist there. Okay. And then we produce that information there, which is the uncertainty map. Lo and behold, huge
amount of uncertainty. If you look up, this was a reservoir. They forgot to put water. And then the system tells, these are the points that you should include. They are highly uncertain. In this case, it was simple to go to the next slide because this was water. Fine. Round two. Uncertainty almost removed, but close to the area, which
this water has a fire, there's an eye of a high uncertainty. Why is that? This is the pulse. Areas which are flooded part of the year and which are non-flooded, so the
water, which actually means a certain thing if it's water all the year, is different from areas which are, so you need now a category of wetlands. And up there is another vegetation type which was not there. So go and get the next round. So the baseline
of all that is that you increase significantly your explanation power, and the users learn what are the possibilities which are available to them. Because you are in the loop, and the system is telling you, there's, eventually you come up, that's the best I can do. I
cannot do better. Which brings us to two final slides, which is where we are going Well, 95% of the people on Earth who do land-use classification use decision trees. Okay, either
decision trees proper or random forest. That's what I would argue 85% of, 90% of Google Earth Engine use. Problem with the random forest, it's a high-echo model. At
random forest, you have to go for it. And this decision, of course, favors the classes which are more likely to appear. So random forest guarantees you a good overall accuracy. But you may have classes which are less representative in the data set that random forest by definition cannot get. And that's where you get these transformer models.
If you heard of DOE or GPT-3, this is all transformer based. There's a paper, if you want to delve into 2.0, attention is all you need, written by Google, who started the whole transformer resolution. Transformers are simple to understand, but very hard to implement. But essentially if I have a sentence like, look at all the lonely people, and
I want to translate it. The problem is, look and people are the crucial parts of the sentence. All day doesn't do anything. So I need to relate look and people, and that's what DOE does and GPT-3 does. In our world, what happens is I'm relating observations.
So for corn on this data set, the red band, there are two places, two time points in the red band time series, which are particularly related to corn and negatively related to soybeans. And in the end of the cycle, because we replace corn with
soybeans, of course these later on are more related positively to soybeans and negatively to corn. So end result story, you may not get a better overall accuracy if you use transformers, but you certainly get the best you can get of user accuracy and produces
accuracy over less frequent classes. So we end up, progress has been called the gold standard. The gold standard is by WRI, not by me. This is what we got with only 400 samples
and transformers, which means one of your deliverables, don't tell it to Ervin, neither to Patrick, but you have one of your deliverables of Tropical Deforestation Monitor ready. You cannot get better than this. You cannot get better than 90% of the gold standard. That's it. Thank you very much.
Roberto, we have a number of questions for you that were submitted through Slido. Let's start with this one that has four votes from Patrick. It says, was there any account
of widespread secondary forest removal in the Amazon during the years where GFW stats spiked? If so, what were the drivers? Good question. It has to do with secondary vegetation. It may be a coincidence because secondary vegetation in Amazon is actually a defect of poaching. What happens is the
guys that cut the forest are not the farmers. These are organized crime, which poaches the land. So they go there, they cut the land, and then they go up in the legal registry to buy ownership of that land. And they, depending on the land price and depending on economic price, they may have two or three years before they actually sell the
land. In those two or three years, there may be growth of forest enough to get a response in global forest value as well as trees. So I would argue it may have to do with the algorithm, but for me, it's more likely that it has to do with economic cycles.
That's what I would be tended to because the guy would, it's a land grabbing, land price story. No, not produce. We have another product called TerraClass who tracks land
use after secondary vegetation. So another product, which is a companion product, but we found out secondary vegetation in the Amazon is very much related to poaching and to land pricing currently. Great. Yes. So our other one with four votes is also from Patrick. Should we abandon categorical maps completely in favor of continuous
variables from which to dynamically drive maps and matching local requirements? Do I have an answer? I only have a guess, Patrick. I think no, because the use of the words, we need words to communicate. I need to tell the Pope about deforestation.
So he would not, you know, it's not what I'm joking with the Pope, but Macron and both on Iowa and who helps you. So they understand words. That's what our mind works. So I'm afraid that we, at least for the moment. Great. Okay. We have a couple more questions.
So how can, how can end users use sets to produce globally consistent maps, for example, climate change modeling? Well, to be quite honest here, so you, we have not tested sits in with the tested sits at Amazonian scale, two medium square
hectare, three million square kilometers, three million square kilometers, which is big enough for us. We have not tried to run sits and globally consistent. And we, well, first of all, we would not do it because we're not in the, we do not
understand the others. We had trouble enough by trying to do Mozambique. So we went very, Mozambique asked us to do using sets and we're very confident because we had good results in forest in Brazil. Mozambique would be a piece of cake. And we went like, because everything we knew about forest had nothing to do about
Mozambique works. So it took us months of work to the Mozambique guys to try to understand what did they mean by forest? So it's hard enough to do Mozambique in Brazil. I don't, I'm not, that's why I appreciate the work of people who do
this big climate maps. I think they're useful. Again, the point is they are mostly self consistent. Okay, we got a lot of questions. So let's just ask two more. One from Tom says, should we stop mapping land cover classes and instead focus on more robust variables, for example, traits of biological species, crop types, canopy
height, DBH? The problem here, Tom, is the same question. Okay, go ahead. Another one. All right, we'll move on. Oh, I was thinking of John. ARD and DRI got popular, but in your talk, give data to users seems to deny the convenience
of ARD and DRI, is it? Well, no, I mean, what I what we rely on is that the data which sits on Amazon, Microsoft, to DS is ARD. So basically, what we give to the users is say, okay, you are, Amazon get this collection in Amazon, and you select, you do like you do in Google, select the space and select the
time and select the how many days you want the information. So if you want one day, one month cube, two months cube, 13 day cube. So we rely on the existence of analysis ready data, which is true for most data of interest to land use and land cover may not be true for other
kinds of work. Okay, let's ask one last question. Sorry, it's a very popular talk. So the last ones can get moved on to the discussion forum. But one anonymous question, does the selection of additional training points
proposed by sets lead to overfitting? How did you cope with Good point. I mean, overfitting is is one of dark art. And anyone who's worked in deep learning will tell you this is a dark art. You can get overfitting two different ways. One is if you start selecting like you go 50,000 samples, and then you try to sort out what's good and what's bad. You may have
overfitting there. You may have overfitting if you go step by step. We tend in our experience is that we have a mixed pack with some agricultural guys like to say, I want to separate soy corn from soy millet from soy cotton, and
I want to see if you separate. I mean, there's overfitting, I can tell you, it's what we try to tell them is that if you run a random forest, you get your baseline. So you get if you get 85% overall accuracy in random forest. And if you run a deep learning model, and you get 95% accuracy, you're in
trouble. Don't believe the result. That's how we tell them run a random forest baseline. And then the best you can achieve in overall accuracy with the current technology with the same data sets is 2% 3%. If you get 10%, that's
overfitting. Don't believe it. All right. That's all we have time for for now.