Environmental Monitoring with Large-Scale Land Cover Classification
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Anzahl der Teile | 52 | |
Autor | ||
Lizenz | CC-Namensnennung 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/44679 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
FOSS4G SotM Oceania 201951 / 52
9
11
12
14
15
17
20
23
26
28
30
32
34
39
44
00:00
Überlagerung <Mathematik>MaßstabMessage-PassingÜberlagerung <Mathematik>ExpertensystemProjektive EbeneProzess <Informatik>ProgrammierumgebungVersionsverwaltungPeer-to-Peer-NetzComputeranimation
01:09
RohdatenKlasse <Mathematik>MultiplikationTemporale LogikSatellitensystemFlächeninhaltProzess <Informatik>RPCFlächentheorieÜberlagerung <Mathematik>DifferenteTypentheoriePhysikalisches SystemProjektive EbeneAuflösung <Mathematik>DatensatzBit
02:52
ProgrammierumgebungTablet PCSelbst organisierendes SystemPhysikalisches SystemÜberlagerung <Mathematik>Klasse <Mathematik>BitFlächentheorieInformationProgrammierumgebungPackprogrammFlächeninhaltFormale SemantikWasserdampftafelDifferenteWald <Graphentheorie>MultiplikationsoperatorZeitzoneKurvenanpassungIndexberechnungTermMinkowski-MetrikSelbst organisierendes SystemZählenProdukt <Mathematik>Überlagerung <Mathematik>ZahlenbereichUmwandlungsenthalpieInnerer PunktSoftwareentwicklerKomplex <Algebra>Physikalisches SystemAnalytische FortsetzungVorlesung/Konferenz
05:24
Open SourceMAPGasströmungÜberlagerung <Mathematik>Überwachtes LernenMeterKontrast <Statistik>Überlagerung <Mathematik>TermMAPKlasse <Mathematik>Prozess <Informatik>Open SourceVirtuelle MaschineAlgorithmusPixelPunktGeradeOffene MengeDifferenteWürfelAuflösung <Mathematik>InformationEntscheidungstheorieFreewareProjektive EbeneArithmetisches MittelDynamisches SystemPhysikalisches SystemFramework <Informatik>ExistenzaussageTouchscreenSichtenkonzeptCodeFlächentheorieSystemplattformNatürliche ZahlNeuronales NetzTopologieRankingRechenschieberTypentheorieComputeranimation
09:53
Bildgebendes VerfahrenRechter WinkelKlasse <Mathematik>Prozess <Informatik>Neuronales NetzFaltungsoperatorZahlenbereich
10:25
RechnernetzFaltungsoperatorUrbild <Mathematik>Zellulares neuronales NetzOpen SourceDatenverwaltungSystemplattformAnalysisEntscheidungstheorieSatellitensystemSoftwareWürfelSpeicherabzugDigitalsignalStatistikProdukt <Mathematik>FaltungsoperatorSchlüsselverwaltungAlgorithmische LerntheorieProdukt <Mathematik>StichprobenumfangTwitter <Softwareplattform>Fächer <Mathematik>WürfelBildgebendes VerfahrenSupercomputerAnalysisFacebookEntscheidungstheorieGRASS <Programm>Prozess <Informatik>TopologieOpen SourceOffene MengeNeuronales NetzFunktion <Mathematik>IndexberechnungSatellitensystemFormation <Mathematik>NP-hartes ProblemEndliche ModelltheorieSystemaufrufZeitreihenanalyseStatistikGewicht <Ausgleichsrechnung>StrömungsgleichrichterDatenverwaltungArithmetisches MittelDämpfungDigitalisierungExpertensystemPunktDichte <Physik>MAPSystemplattformKlasse <Mathematik>Green-FunktionVirtuelle MaschineFront-End <Software>Computeranimation
13:56
FlächeninhaltIntelATMMinkowski-MetrikDatenmodellNeuronales NetzVirtuelle MaschineSchaltnetzMedianwertNatürliche ZahlProzess <Informatik>Klasse <Mathematik>Green-FunktionMaßerweiterungProdukt <Mathematik>PixelEin-AusgabeSchlussregelSchwellwertverfahrenMAPBetrag <Mathematik>MultiplikationsoperatorVariableKonfiguration <Informatik>Lesen <Datenverarbeitung>ExistenzaussageFlächeninhaltShape <Informatik>CodeAlgorithmusDatenfeldBildgebendes VerfahrenAutomatische IndexierungMereologieFrequenzBruchrechnungWasserdampftafelRandwertMinkowski-MetrikÜberlagerung <Mathematik>TermDiagrammLuenberger-BeobachterDifferenteEntscheidungstheorieStandardabweichungWiderspruchsfreiheitIndexberechnungRechter WinkelEndliche ModelltheorieCASE <Informatik>Computeranimation
17:54
Komplex <Algebra>AlgorithmusKlasse <Mathematik>ProgrammierumgebungFlächeninhaltTafelbild
18:36
p-BlockProdukt <Mathematik>Proxy ServerVollständiger VerbandBildgebendes VerfahrenMultiplikationsoperatorWärmeausdehnungSoftwareentwicklerMAPComputeranimation
19:06
Textur-Mappingp-V-DiagrammFramework <Informatik>MaßstabImplementierungSoftwareOpen SourceDatenfeldMAPNeuronales NetzVisualisierungKlasse <Mathematik>MultiplikationsoperatorRPCSatellitensystemPolygonÜberlagerung <Mathematik>Auflösung <Mathematik>PackprogrammPunktwolkeMapping <Computergraphik>Virtuelle MaschineRechenschieberProgrammierumgebungWeb SiteFreewareWiderspruchsfreiheitSchaltnetzUnternehmensarchitekturPixelPhysikalisches SystemTypentheorieRechter WinkelSoftwareQuick-SortAlgorithmusComputeranimation
21:57
Proxy ServerLemma <Logik>p-BlockProdukt <Mathematik>SoftwareentwicklerBildgebendes VerfahrenQuadratzahlGreen-FunktionTeilmengeBitVollständiger VerbandPaarvergleichÜberlagerung <Mathematik>ZeitreihenanalyseDifferenteMathematikFlächeninhaltKlasse <Mathematik>ImplementierungFormale SemantikMultiplikationsoperatorAutomatische IndexierungComputeranimation
Transkript: Englisch(automatisch erzeugt)
00:02
When you want to tell a story, it's important to think about your audience, who they are and how you can best communicate your message. When you're a technical expert, and it's really easy to get bogged in the details,
00:22
and in that process a message can be lost. As spatial people and mappers, we want to tell stories to other people. We want to tell stories to Kurt's dad, we want to tell stories to politicians that make the policies to manage this planet, and we want to tell stories to our peers and our friends
00:42
and the community that we work in. This project that I worked on, the Land Cover Classification System, is about the team that I work with trying to find a better way of telling these stories. I'm going to step through. That was my version of doing a cheesy TED Talk intro. Instead of just being like, hi, my name's Sean, here we go, blah, blah.
01:04
The title of the talk is Environmental Monitoring with Large-Scale Land Cover Classification. When I talk about land cover and not land use, because they are different things, I'm talking about we might have a city like Wellington,
01:26
and we might say that Wellington is urban. Or we might be in the Murray-Darling Basin in Australia, and we might say that some of this area is agriculture, so we're assigning it a class. It's really helpful that some people don't like going last when they talk,
01:43
because you're all tired and you want to eat food. But I'm lucky in that I get to benefit from all the speakers before me, and you're kind of primed if you're here, but if not, you're not going to be lost either. Land cover is really complicated, because the planet and the surface of the earth is dynamic.
02:04
It's changing, it's complex, you've got large amounts of data. You've got multiple satellite systems with different types of resolution, spatial and temporal. Projections, I hate projections, I'm sure some of you hate projections too. Well, I don't hate them, and is this recording?
02:27
And so, classifying the land cover from the raw satellite data that we get down, processing it and then turning it into a classification is one of the biggest problems in remote sensing, and it's not a problem just because a technical problem that we can't work out what it is,
02:44
it's also a problem that are the classes meaningful, and what story can we tell with those classes and how do we come up with them. So, I'll keep going. Australia is a really complex landscape. It's got desert, urban areas, into tidal zones, cultivated areas, forest, sub-alpine,
03:07
has a tiny bit of snow, and less and less every year. It's really hard to squeeze the complexity of the entire earth into a bunch of classes chosen by us humans.
03:21
Often when you're looking at remote sensing data, it might be continuous values, ranges, these indices that Kurt was talking about, the WAFS product, which is the water from outer space and how present it was, but when you want to consider it in terms of these classes, you're trying to squeeze all this complexity into a bunch of almost arbitrary classes, it seems sometimes,
03:41
and how we define these classes is really important, particularly given that we have all these different climate classifications within Australia, but this is applicable to kind of anywhere. The reason that we want to classify this data, because we do things for reasons, is because Australia and many other countries need this information to inform their policy
04:05
about how they manage their country, their planet, the environment, and they need it for more emerging fields, like environmental, economic, accounting, and globally, for some of the sustainable development goals can purely be accounted for and measured based on land cover.
04:26
So there are a large number of reasons about why we want to know what the surface of the earth is, what it was through the archive, and how it's changing from each time to time. Now, the system that I work on was the land cover classification system,
04:43
and I'll talk about kind of the alternative method to this afterwards, but our method works based on the classes defined by the Food and Agriculture Organization of the United Nations, and these classes are internationally recognised,
05:01
and they're clear, and the semantics around them are quite established, and a lot of this comes down to the semantics of the classes, because that's the story you're trying to tell when you say something is vegetated, or do you mean 1% of it was vegetated, how much of the year was it vegetated for, and this specifics is laid out in the land cover classification system.
05:25
Now, why open source? I noticed that my talk didn't have open source in the title, and most of them seem to have open source in the title, so I added this slide yesterday. And I was sitting in a cafe, and I was like, oh, why is the open source important?
05:41
Because people were like, oh, transparency and free stuff, but I think it's really, really important when you're trying to give these lines of evidence for policy makers and those who are going to manage the earth to have transparency about the processes. And I'm not just, you know, I'm talking about open data in terms of Landsat and Sentinel.
06:01
Yes, there's Worldview, it's pretty awesome. There's Planet Labs, which have one metre daily resolution data. That's all great, but you have to be able to afford it if you want to use it. Making this land cover classification system open source, because it's a framework in which you can plug and play different layers into it,
06:20
means that it's not only relevant for Australia, and same with the Open Data Cube platform, it's not only relevant for Australia, you can implement it in other places, and we're implementing it in other places at GSI. Science Australia, but also other people who are deploying their own Open Data Cubes, and they can deploy their own land cover classification systems when it's finished.
06:41
That's my asterisk. And so I think having the clarity in terms of the open data and being able to, one, contribute to the code, if someone finds something that they want to add, like a feature, and being able to see the line of evidence for policy makers, you get as much detail as you want, but I think that transparency is important
07:03
when, one, you're using taxpayer money to fund these projects, and also for the people who are interested in it to be able to dive into the details and not feel like everything's kind of hidden under the hood, in terms of the decisions about their country or their planet that's being made using this information.
07:23
Now, this is the land cover classification system up to level three. It looks pretty simple, because it is. It's a binary hierarchical tree, so if something's classified as one class, it can't necessarily be the other. That's not what the world is, because the pixels that we get,
07:44
say 30 by 30 metres for Landsat or 25 by 25 metres, depending on how you aggregate it, you can have a lot going on in 30 metres. We will probably be all fitting in within 30 metres now, but you wouldn't be able to see each of our fleshy tones in a Landsat pixel.
08:03
And then it ends up as one of these level three classes. So we start off with vegetated or non-vegetated, and then we split between terrestrial and aquatic in each of those. As we move further down the tree, we'll look at cultivated, natural, bare surfaces, aquatic vegetated, like mangroves, and aquatic cultivated,
08:23
which I didn't really think of as much of a thing until I started this project. After this, we get down to level four, and this adds a lot more detail, and I realise this is a lot of small text on the screen, you don't have to read it. But here are things like height and crop type.
08:41
And the main point that I want you to understand about the LCCS system is that it's kind of plug and play with each layer. I'll show you the other method next, and then you'll understand the contrast between them, because the purpose of this talk is to kind of look at different ways of how people classify the earth and the pros and cons to each method,
09:01
particularly when you want to tell your story. This project is kind of the successor to the dynamic land cover dataset, which was implemented using MODIS, which is 500-metre pixels and daily resolution, and implemented continentally. So we're kind of progressing on from this into using the classes defined by the FAO.
09:26
But this isn't really cutting edge, it was done a little while ago. What is cutting edge is something like this. This is the Victorian vegetation map run out of Monash University. It's using Sentinel, and it's using a convolutional neural network,
09:41
which is a machine learning process based on, it's a supervised learning algorithm that uses training data, so examples of what you want to identify to separate out its classes. And it is really excellent. They've given it a lot of training data. And here's an example of the red, green, blue image on the left,
10:02
and on the right, you have the classified image. And it does a really good job at separating something like a standard image into defined classes. But there are downsides to this. For a number of reasons.
10:21
A convolutional neural network is, you need training data to make it. And the training data needs to be good, and it needs to be cleaned. When you train the model, you know, I don't expect you all to be machine learning experts, but it's hard to explain the symbolic meaning of how the weights
10:43
and biases and outputs reach their classifications. And, you know, this happens with, you know, if you have Facebook and you upload an image of yourself to it, it can sometimes identify who you are. That we'll be using a neural network to classify yourself.
11:01
But it can be hard to understand how, based on all the images it's seen, which features of your face is it using to pick out, to decide that it's you. And that's the same when you give it these examples, these training samples. So a training sample might be like, you walk out into the park, then you go, OK, it's green here, it's grass.
11:21
And you take your little, you know, you've got a GPS and you write, this is grass, and then you send that off to your model and the model looks up the Landsat data or the Sentinel data at that point. And it draws out all the bands that we're talking about. And it can go, OK, these values of these bands or indices is grass.
11:42
But then it goes through that training process. And that can be hard to explain. Not everyone is the biggest fan of convolutional neural networks due to the lack of explainability for the general public, including, no. This is a real tweet and it's not about neural networks.
12:03
We use the Open Data Cube, which I don't have to talk about because Dave already talked about it. It's an open source geospatial data management and analysis platform. And as Dave said, it takes the satellite data, does a whole bunch of processing on it, geo-rectification, adjusting for aerosols,
12:22
and aligns it in a way, at least for me, as a scientist, because that's my job. So Dave does all the hard backend work and I get to have fun trying to tell these stories. Not that Dave's job isn't fun, Dave's is pretty fun. And also, like we've shown before, we use the supercomputer at ANU.
12:45
If you ever have a chance to go visit a supercomputer in person, I highly recommend it. It's really amazing and very, very noisy. You have to wear earplugs. We also use AWS, and that's for the deployment of the Open Data Cube that we use, which is Digital Earth Australia.
13:02
So I'm just going to talk about our approach to how we separate out the classes. We use biophysical and remote sensing principles. We use high-dimensional statistics, which is really exploiting that dense time series of satellite data that we have in the Cube. We leverage existing products. Now, this is important, so we're not starting from scratch
13:21
with something like a neural network. But we also do use supervised learning, particularly to pick out the importance of the different features. So you can get all a bunch of indices and give it your training samples and be like, if I want to discriminate a farm, which indices are most important for being able to classify it?
13:41
And you can use something like a decision tree in Python, scikit-learn, to help rank the importance of those features. So I'm just going to step through what an example might look like using level three. So we start with vegetated and non-vegetated, and that's using the fractional cover product,
14:00
which talks about the greenness. It does a spectral line mixing for each pixel and talks about the proportion of it that is green or bare or non-proto-synthetic. And that's our primary input to determining whether something's vegetated or not. And for that, the definition, please don't read this code. The definition is, this is from the UN handbook,
14:24
class-supplied areas that are vegetative cover of this and this, consisting of extra blah, blah, blah. Those definitions are pretty important because even though you might disagree with it, they're generally internationally recognised. If you want a really specific map of your exact area,
14:41
something like a neural network using really high-res data might be the solution. If you want a continentally consistent and robust map for decision-makers, which is able to be applied over different areas of, like different parts of a large geographical area, then you need to start considering about one,
15:01
the compute cost of running it, processing such a large area, particularly if you're going to use machine learning, how you're going to get training out of such a large area. And so that means if you have a chance to do a classification using a simple rule or a simple tool, provided you check the assumptions of it, it's usually to keep it simple, stupid. Like it's much better to go with the simpler option
15:21
instead of throwing the complicated machine learning algorithm at it. When you go down to level two, we get to use all the existing products. This was a very easy part of my job. We get to use the water observations from space. That's Lake George in the animation on the left, and Lake Burley Griffin, which is in Canberra,
15:41
my beautiful hometown and cultural capital of Australia. We use the intertidal extents model, and we use a mangrove layer. And you get to add these up to come up with our classification. When you move on to level three, this is when it starts getting hard. And I still struggle with defining these classes
16:02
and getting them to all fit together. This is the median absolute deviation, which is showing the variability over a whole year. You can run it over whichever time period you want. And so you're summarizing a whole bunch of images into a single image and looking at how much it varies over time.
16:20
And this is really useful for phonology, for trying to identify cultivated areas. And so I was drawing a little diagram the other day, and I was like, oh, how do we identify a cultivated area? And we typically use something like, it needs to be variable. And this isn't for every case, because orchards won't have that signal. But for some cases, it needs to be variable
16:43
in terms of it's going up and down, and it needs to pass a particular threshold of greenness. The shapes of the fields are also actually much more regular, so square or less complex than others. And we kind of iteratively wrote these rules down to try and discriminate these areas. When you get to urban areas,
17:01
we use something like the normalized built-up index or existing indices. And that's the same for areas like Perth over here on the right, which combine these peri-urban or peri-areas, these boundary areas are really where you run into problems. And we use a static layer for the geospatial fabric. Now I'll show you some of the classified examples.
17:23
Here's an example of an agricultural area, and when you run it through our classifier, the bright green is showing the agriculturally managed fields, the blue the water, and the dark green the natural vegetation. Here we've got a coastal area, and you can see combinations of lots of natural vegetation
17:41
along with bare soil. And here we've got Perth. And you can see the urban area in pink, the natural vegetation, and the cultivated area. This is just a level three. We have a lot of problems with complex environments, and squeezing that complexity into a bunch of classes
18:02
that are meaningful for people is still an incredible challenge for us. If you want to talk to me afterwards about any problems you've run into or any solutions, please feel free or ask me a question, because particularly in the desert, getting saturation from salt lakes was something I never anticipated,
18:23
but that is a very hard thing to classify, that most people, to get an algorithm that works in a city and works in the desert and works in a coastal area, getting that robustness is really challenging. So I mentioned the sustainable development goals before,
18:41
and if you have an indicator such as the proportion of land that's degraded over a total time, you can measure that directly by comparing two images that we've classified using this level three and seeing whether the agricultural expansion, wetland drainage, vegetation establishment, and using that, you can understand about whether we're getting towards,
19:03
closer to what's meeting our goal. I'm not going to talk about that. The next steps are primarily adding more detail via level four classification, which is adding that height, adding those crop types, which are the detail that some people want, but maybe not everyone needs.
19:21
Incorporating different sensors, particularly central one and radar. I love using radar data because I don't have to think about clouds and we all hate clouds. And those are the two primary next steps that I'm going to be working on with the team. My take home is that when you want to do the storytelling, you need something that's going to be robust and consistent.
19:44
It is, in my opinion, having done both types, much easier to get your training data and run a neural network for a small area and get a map that works exactly then and there, and that's awesome. If you want something that goes back through the whole Landsat archive for 30 years, that has consistent mapping from time to time
20:02
while the environment is changing through things like human practices and climate change, you need something that's really robust, reliable and standardised and has classes that are understandable. Now, understandableness is what underpins the whole land cover classification system. Yeah, that's it from me. Thank you.
20:26
Do we have any questions for Sean from the audience? Right, I'll kick things off. With the way you designed the machine learning algorithm, you need all of that training data, so is that able to be labelled through an algorithm,
20:43
or does someone have to label all those pixels? How does that work? There's two approaches. Someone in the team does do an automatic training data classification, but we use a combination of field data because as remote sensors, something that I need for me
21:03
is that when I think is really important for all of us is it's very easy to look at a place through a map, but that's not actually what it looks like on the ground. It's really, I really stress the importance of going out to the sites you're investigating and learning about them because you will learn things that you cannot find out
21:21
from a satellite or from data or from the internet, and so we use a lot of field data from the States, South Australia and Victoria, and we also do a lot of visual interp using that high res data for a single date and drawing out polygons and being like, this looks like this, this looks like this, and then we extract the training data from the relevant lower resolution but free data,
21:43
so we're kind of bringing the paid data to you for free. Great, we have one more question at the back here. Hi, can you talk about the thing that you said you weren't going to talk about? There's a really interesting looking slide a few slides back. This one? Yeah, and it was about automatically working out
22:03
whether the sustainable development goals were being achieved by the change in classification. This thing, yeah, yeah, yeah. I'll just add a little bit more detail to it, so it's obviously a lot of text, but the orange squares are supposed to show degradation, so movement away from achieving the goal,
22:21
and the green are supposed to show kind of growth or moving towards meeting that goal, and whilst the individual classification images are important, it's really the comparisons and the exploiting the rich time series and doing this change, interfacing between,
22:40
that lets us know how we're growing towards meeting these sustainable development goals or not. I've also worked on another one, which was the mountain green cover index, which was, that was the technical implementation, but the goal that we were measuring was, I think, a subset of this one, which was looking at whether alpine areas
23:01
are improving in their ecological health over time, their green cover, or whether they're being cleared for snow resorts or something else, agriculture. So you've got to find the particular products that are going to need to start with the goals and not maybe start with the data, come up with the class from the data, and then squeeze it into the ...
23:21
I mean, there's a bunch of different ways you can do it, but I prefer starting with the semantics, yeah. Fantastic. Let's thanks Sean again and all our speakers.