We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Global-scale land cover mapping using FOSS4G (part 2)

00:00

Formal Metadata

Title
Global-scale land cover mapping using FOSS4G (part 2)
Title of Series
Number of Parts
27
Author
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Producer
Production Year2020
Production PlaceWicc, Wageningen International Congress Centre B.V.

Content Metadata

Subject Area
Genre
Abstract
Land cover is of high interest and a key variable that helps monitor multiple SDGs. The Copernicus Global Land Services – Land Cover team is producing global land cover maps at 100 m resolution from 2015 onwards, which allows us to monitor land cover and its change over time. In this presentation, more details will be given about how the processing chain works, and what free and open-source software was used in the making. In the practical session, we will have a chance to produce our very own land cover maps from the fraction layers.
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Program flowchart
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Transcript: English(auto-generated)
Well, welcome back everyone and welcome to those who have not been there in the first part. For those who are joining us and you, I will still go through a few of the same slides that I did in the first part, but it won't take too long. So I'm Deimos Mussiunas from the
Laboratory of Geo-Information Science and Remote Sensing at Wageningen University and research. And here you can also see my supervisors for my PhD, Mendein Erdin-Sandbazaar, Martin Herold and J.F. Wesselt. And the other people listed here are from the Copernicus Global Land Operations Land Cover Project. And in this presentation I will be focusing on the
final parts of the content here, so specifically land cover change detection and how it was implemented using BFAST and I also will mention a bit about the OpenEO project.
So just to give a bit of a summary, we're talking about land cover here. So land cover maps are what we generally can see on the ground. So it's trees, grass, water, urban or other things like mosses and lichen and other things that you might be able to discern from satellite imagery.
And knowing what is on the ground is important because then we can quantify it and we can also track how it changes over time. And land cover is a key variable for several of the United Nations sustainable development goals and because it is very important both for governments to set
policy because they can tell how land cover was in the past and then how it will be in the future. It allows us to monitor the current situation, predicting the future change and it can also be useful for both landowners on the ground as well as modelers who want to predict
say the change in climate or the population development. And thanks to global satellites imagery time series we can do it globally as well and this updating is also very essential for effective monitoring and this updating part is typically what I will be talking about
in this session. So one of the things that I did for that was in the context of the Copernicus global land services land cover project. And just as a reminder the project is aimed at producing global land cover maps at 100 meter resolution and specifically from 2015 updated yearly
as an operational service. So far we have the collection two released and in a few days you will be able to also download collection three and collection three is specifically with yearly updates. So you can follow the release schedule on twitter the first one is
the main one VTOL remote sensing and also the Copernicus channels. It's very exciting that collection three is coming out soon because that will have both the updates for all pixels
in the world from 2015 onwards as well as quality indicators for the pixels as well both for the input data and for the algorithms themselves and including some spatial accuracy maps. So what I did specifically for this is something that is not yet in this figure
and that is the updating of land cover maps. So yeah I can mention that when I was working on this I was working in the
telescope system which is a cluster that is provided by VITA and in the cluster you get a resource as a regular laptop, four colors, eight gigabytes of memory, one terabyte disk space and it's also running CentOS 7.4. The nice part about it is that it has direct access to
the data so specifically I was using level three top of canopy composites which includes radiometry of the four bands of the satellite. The satellites here and also
it includes access to the ARD which in our project was developed to mask out the clouds better and to remove temporal outliers and do also data infilling so that it's analysis ready and you can
do your time series analysis effectively and it also has an access to the Apache Spark cluster so there's a cluster of quite a lot of machines and you can generally get over 1000 cores if you write your programs and wrap it with Spark so it can run on the different workers
of the cluster. For the extraction you cannot easily parallelize that because it's on
a network to test storage so even if you try to have two threads reading from two files it's still going to be as slow because there's just like one head and you don't really know where the data is so it doesn't really make a difference.
So this is the part that I want to cover in this session so specifically what I was doing is doing line cover change detection to enable the map updating and the obvious way to update line
cover maps would be to just classify maps for every year and you can expect that that should give good results. Unfortunately that does not happen because if you reuse the same model for the upcoming years there are a lot of spurious changes so that's because the algorithm thinks
that well this looked like forests maybe it was 51 percent sure this is forest and 49 percent sure this is grassland and now it is 51 percent sure that it is grassland and 49 is forest and so the tipping of this makes it so that the predicted best guess of a line cover
class is now grassland forest. If we look at the differences between the maps we would see that there was a line cover change but actually there's no change it's just the model thinking that the most likely class is nitrogen. And also
so we can use different things to make it better. We can use the time series and we can check whether there's any breaks in the time series to see whether
a really significant change has happened or not and that way we can use the change detection methods to constrain the changed pixels to a smaller subset which we are sure that it has actually changed. Another thing that can be done is to use expert rules
to determine which transitions are possible or likely. So for example here we can see an example of unlikely line cover change. So a change from urban to water is something that you would not expect to happen because that would mean that you have an entire urban area completely flooded with water and that can happen in some cases but that's not something that
is very oftentimes the case and in most cases if a model is showing that something has changed from urban to water it means they just made a mistake because both urban and water are quite similar in its special characteristics because there's no vegetation on it.
Yes? So for those who are at home the question was that whether we can set a seed for the
models so that for every year they would keep detecting it the same way as it was before. Yes we can so we can instead of training a new model for the new year we can reuse the same model and just give it the new data from the new year. That's actually the way that we
are doing it in the object itself. The problem is that it's not just the randomness of the input data. So the input data will have changed slightly and then it will still go to a different
pathway in the random forest and it will still give you a completely different outcome. So especially because random forest by itself it uses thresholds so if you go against that threshold then you will get a different output. So it doesn't really work that way usually. Yeah so the extra tools are useful as well
and that's actually something that we're using in the project as well. There is something called a Markov hidden model which is a post-processing approach so that if we have land cover classes over a longer period of time we can give
likelihoods to specific transitions from one land cover to another and then if the land cover does not conform to those transitions then it will get forcibly reset to a configuration that makes more sense. So for example if we have something that's grassland and then it's
immediately trees and then it's grassland and it's trees again that's very unlikely to really happen because for trees to regrow you need several years. So then the model would make it so that it's only trees throughout the entire time. So that's one thing that the project did
and I specifically looked into the time series rate detection and this is what I will cover in the session. My task was to see which rate detection algorithm and vegetation index works the best for global land cover change detection and for doing that I was using BFAS.
So BFAS is a set of algorithms for detecting change in time series, specifically for detecting breaks. It's also the name of one particular algorithm in the BFAS family and the way that
it works is that it takes a time series. So we can see here the time series at the top and then first it splits the time series into components. So it has a seasonal component, you can see that the seasonality is captured here, it has a trend component so only trends are captured and then the remainder. So if we take both seasonality and trend out of it what remains
and then based on this in the text breaks in both the seasonality and also in the trends and it tries to segment the time series based on that and the way that it works is that
does piecewise linear regression. So it tries to see where do you construct these different segments so that it would minimize a particular statistic in this case it's determined by BAC
and if we have segments of the same parameters so the same seasonality the same trend and we have these segments that have different coefficients between them then the boundary between those two segments is going to be the break for example in this case we see several breaks
we see one here where clearly the trend has changed we see here well the trend even reversed we see another one here another one there and another one there and yeah this way we can detect all the breaks in the time series with one caveat and that we can all detect breaks at the
very end of the time series and at the very beginning of the time series because we have a parameter h which determines the minimum segment size so it doesn't try to detect segments that are smaller than this amount of samples and I was working specifically on a
variant of DFAS called DFAS01 which is a working title we might just call DFASLite later on it's the same concept but instead of having this subdivision into the different components it's just trying to detect breaks in all the components at once this makes it much faster
it can also handle missing values in this case because instead of doing the decomposition with STL and R which is the usual way that we will do it and STL doesn't actually handle missing values in DFAS01 we are just using a data frame and it just emits the values that we don't
have and we calculate all the things that we want so the seasonality for example is accounted for by a harmonic model so it just calculates the harmonic model using the regression and
a regression doesn't need to have time steps that are regular so it just computes the harmonics creates a new column with the harmonic coefficients and then afterwards this is put into another the piecewise regression model
and it also doesn't need to have a regular time steps so it can just detect the breaks on the boundaries yes we can handle missing values it does everything in one single pass it can also make use of different types of seasonality adjustment so for example we can use
no seasonality adjustment so then our modeled time series in this case would be just a straight line going up or going down it can use the harmonics like in this case of this so the model is showing the seasonality and it's trying to fit the data based on the seasonality it can
also use seasonal dummies which is another way of looking at seasonality specifically there's data driven so it's just segments each year into a number of segments and then it just takes the
average over all the years that we have and then this is considered to be the seasonal dummy so every time that we have that particular time of the season the model will just take a jump up or down depending on whether that intercept is higher or lower so then it can model things that are more complex than what a harmonic model can
model because it doesn't depend on any pre-existing shape and yes so because of this it's also an order of magnitude faster than vfast itself and that's in addition to the extra speed improvements that were done by mario subtle and he was working on optimizing the
vfast package and moving the critical code paths to c++ so this is much much faster than what you would get with the default vfast so this is the general idea how we can visualize it you have the actual data you have a model that is fit on the data and then after there's a
break you have a new model that is for the data and if those two models are significantly different according to bac it will determine that there should be a break so we should have the new coefficients in these two different segments and then the difference between the two is the
brave magnitude so the coefficients change a lot that probably means that the magnitude is also high because then the line cover is quite different between previous and there yes
yes so the question was whether it can detect gradual changes and in a way it can the way that it would detect gradual changes is that you have a regrowth that's going up and then it will detect a break before the regrowth happens and after the regrowth is finished
so then you have a trend like this and then you have a trend like that and then you have a trend like this and then so the first one was one algorithm that i used and then as i mentioned it has a limitation that it cannot detect the breaks at the end of the time
series but that is something that we also needed in the project because we want to classify all the line cover until the current day so until the end of 2019 and bfast monitor is made specifically to detect changes at the end of a time series so
you have to manually specify which is the history period from which the model learns about what is the normal state so what's the usual seasonality of it of the pixel and then the monitoring period which it checks whether it still conforms to the
same seasonality and other parameters that we had from the previous histories part so yeah this is used specifically to create this nrt map which is the map of 2019 since not yet a whole year of data is available since we're not at the end of 2020 and that means
that we cannot find breaks using bfast 1 for 2019 so we need to use bfast monitor yes so as i mentioned i use terrascope to run this and it's a spark cluster and at the time when i was running it it gave me around 1200 cores which is quite nice to have i split each of
the style into around 2000 chunks and was very useful to ug to build vrt because i didn't need to physically do that the nice thing about it is that it doesn't just mosaic it can also give you smaller chunks of data and then i could create these 2000 chunks that didn't really have
any data we only had pointers to the original data that saves both time for processing as well this space and then bfast itself is implemented in r so i use sparkr to send the r scripts
to the driver first and then the driver sends it to the executors and then everything is running i also get logging both done by spark so spark has an internal log and any errors or any messages get logged into the buffer there as well as into a flat file so that
if the log fails because for example there's some problem with r itself i could still have some information about at which step did it fail and then after some back and forth fixing some bugs and peculiarities that spark had i managed to get it running quite well
in an automated fashion so we can handle each tile of pull v imagery detect the breaks in the time series and output layers where for each year i would have an indication whether there
was a break detected during that year if there's no break detected then it says minus 99 means no break if there was a break detected then it gives you a particular time of the year during which it was detected and then after if we have say no data if there's too many clouds so we
cannot really tell then it gives an a and then at the end all these small chunks get mosaic back into a tile locally once again using g.buildbrt and we get a global mosaic out of that so it's running in r i was running that manually just through sparkr but in production
we integrated the r script into python using rpy2 that's because in their processing chain they have it so that the data cubes that are the inputs for vfos they are made in python
they have all the metadata there and so on so it's more efficient to just directly map that data cube move it to r and then get r to output the layer and then move back to python and then have python write it back rather than to go through the intermediary step of
outputting a file somewhere so that r can read it again and so on so far yes um is it beneficial to be faster than yes times if the eight days composite or which one yes so the data that we were using
for vfos was modus data that was at i believe the composting was 10 days and it's actually a time series that starts in 2009 that's because on the cluster they had that
data already because well they had other users for the same data so it was fairly simple to obtain it and it was already pre-composted already with the clouds cleared out and so on so it's already analysis ready so it was easy to run and the reason why we're using modus
is that it has this archive going back to at least 2009 in our case probably itself only starts at 2014 so you can see that we would lose half of the time series data if we started with that
yes we applied vfos to the vegetation instance and there were several vegetation instances that tested and we checked which one performed the best and to know how well they performed
we needed to have something to benchmark against so i mentioned that we have the data sets that was created by yasa for training the random forest model in the first place now they also collected reference data for change so specifically the world was stratified
using different land copper maps that are not ours into areas which might potentially have some change and then something from there the experts detected whether there was a real change or whether there was no change in most of the cases we in fact saw even after certification
that there was no real change at the very beginning it was even quite odd how much there was no change because we had like 1000 points that we collected and the amount of change was 60 points out of that and we had to do some extra cleaning out especially to remove all the
instances of fire because fire is a complicated thing that it doesn't really change the land cover it was grassland it was on fire and then it was grassland still but it has a very
big effect on the time series itself and yes so this data set was collected later on we got more change points that we can use to benchmark it and in the end i use this to optimize the parameters also of vfos and vfos monitor as well as to select the jurisdiction index that's
would be the best for this yeah i will explain this i actually have it here well but this is a good opportunity to do that yeah so the parameters are yeah for one thing
the main parameter that we have in vfast is the h parameter i mentioned that is the minimum segment size and that's one parameter that we do not optimize because we have it preset to one year because we cannot have anything that is shorter than one year because then we would have
multiple breaks during the year and that's not what we care about we also cannot have it longer because then we would need to wait even longer since 2019 to be able to detect breaks so we have no other choice but to just preset it to one year but we have other things so we have
this formula here so in this case the formula says response is our time series and then this is what is modeled by so if we say only trend for example then it will make a straight line and it will not model any of the harmonics or seasonality if we say trend plus harmon it will also include the harmonic model and we also have different orders of the harmonic model
and we also have this ability to use seasonal demes instead of the harmonic model and then we also have an addition for vfast alone specifically um yes here since it's doing piecewise linear regression and this determines where the segment
borders should be optimally it uses the information criterion so as i mentioned the default is bac if you recall bac is with a penalty of log n and m is the yeah number of
observations that we have aac is less so it's much weaker than bac which means that for bac it will detect fewer breaks because it will try to not create too many segments with
different different coefficients but even bac was too lenient so we saw that there were still too many breaks that are detected in the time series which means that we needed to go with something even stronger and then what i did is i implemented lwz which is stronger than bac for m
that is high enough um and it definitely does better in the sense that it only detects breaks when it's clear that there is a break so you can see example here so the black line here
shows the breaks that were detected using bac and the t line shows breaks with lwz so you can see that generally in some cases they just agree completely but in some cases where yeah there's it's not clear whether this segment is actually a real change the uh bac will still say that it
is and it will create a segment like this lwz will rather just look at these general trends that this is a change that is going downwards and this is a change where it's not going downwards and this is the comparison of the residual sum of squares and bac and lwz so usually
our assess is just going downwards the more segments we have and bac eventually stops so in this case it stopped at four but lwc is stronger so it stopped already at three and it's not even that much different from just stopping at one
which means that it would only have one break somewhere there so yes so this is also one thing that we needed to optimize to see which one is the best it's also not always clear cut whether this is better or not because it also depends on what is our
objective yeah and here you can also see how we can decompose the time series into a trend component and the seasonal dummies and also into the harmonic components i will also explain that a bit more in detail later yes and yeah i can also mention here that for the vegetation
we were looking at several of them so in dvi evi near v and dmi and i found out that there's not really that big of a difference between them as far as vfast is concerned and then we
ended up using near v partially because it was a bit more sensitive so it doesn't have this issue of over saturation like in vvi does and another thing is that it's not using the square band and the square band is of even lower resolution in modus data and we're already at
quite a low resolution because we are using 250 meter pixels and our mapping unit itself is at 100 meters so we're already going to miss some of the changes that we have
yes and yeah just to mention that what we're looking at here is time series of vegetation index we're not looking at land cover so we cannot really determine what changed to what suggests from vfast we just know that there was some sort of a break
in time series so in any case we need to pair the output of vfast with some classifier that would be able to identify what this is and what that is ideally the classifier would also make use of this information and then look for the classification only in this area that is
stable and also in this area where it's stable again yes and so in the future we will need to scale down even more so so we're moving to 20 meter resolution pixels from sent note 2 which means that we're looking at scaling from modus 250 meter pixels to land setup 30 meter pixels
so that's a lot of more data and already vfast is actually a very data intensive and time intensive algorithm because it needs to accurately try all kinds of different combinations of where
to put the breakpoints to get the optimal one which means that we have a lot of big data computation issues even after these two optimization rounds is still not running that fast
so vfast itself is written in r originally and then mario subtle he optimized it and replaced some of the core components with ones running in c++ and then we also have another version of it that is running on python which has an implementation where as a back end it can also
use gpu's and that is generally much faster again yes in part to this definitely if we
move to gpu's it is much faster because these are all very independent things so for every pixel we have a completely different thing and then it's much faster if we do it like that so we need so it's a natural data yeah well for spark itself it's not very difficult to use
i'm not really using any of its advanced functionalities but actually the way that spark works on r is that it gives you a method for l apply and every iteration of l apply just gets
sent to a different worker so then it's pretty simple as long as you write a function that uses an l apply and you get something back then you can just write through spark and then it gets it's done automatically we just need to also say like how much memory do you need to use because
if you exceed then it gets killed and things like that general questions and then the whole cluster it's nice we get quite a lot of cores but even that is not really sufficient for the big data analysis and also our cluster then doesn't really have the data has neither lansat
nor sentinel to data so then we're going to transition to dss that have sentinel to data lansat is still a bit of an issue and we will need to figure out how to get that data also so
perhaps it's going to be through google cloud platform or something like that or maybe google attention yeah it could be also amazon who are also looking at that okay and then lastly since i wanted to mention open eo so this is uh what i have about it here it's a very small
slide but if you have any more questions i'm happy to answer them afterward as well so one way that we could use to solve this issue of big data processing is the other project that i'm involved with which is open eo and there's a framework for handling such large amounts of
earth observation data by making use of cloud platforms that are already at the endpoint where the data is located so we have a lot of disparate cloud platforms for example the telescope platform itself is built specifically for provid data so they provide approval data
they also provide computing power that can directly operate on the data and we have also others like dss that have the sentinel data they also provide the processing directly next to the data but they are different platforms so they would need to use different
interfaces so you cannot just run the same script on both and expect it to work as maybe ours is using spark and theirs uses slalom or something else so open eo helps with that
because in there you have a unified api which is a json based kind of process graph implementation where you can write what you want to do in our python javascript or if you want you can make some other plans then this process graph gets translated into a json object that gets sent
to the back end and then the back end knows how to process the data in the way that you want and then it doesn't matter what back ends it is it will do the same operations because the commands that you're asking for are the same so we can for example calculate mdvi so you
say that you have these uh sequence of commands uh this kind of net specific operations in the sequence and then it doesn't matter which back end you're running it on they will all understand that this is what you're supposed to do and then the back end accesses the data locally
on their side runs the script that is generated from the process graph and then it returns the results directly to the user so you only download what you need in the end and indeed this also allows us to run vdfs which is a very important part so if we have something like vfast which is
really complicated and it's not something that you can implement with just band arithmetic or something else then you just package your r script itself you send it to the back end then you say please run this r script as part of this processing chain and then eventually when
it gets to that point it will send the results to the user defined function the function itself runs on this in a usually a container or something like that and then the output gets put back into openio and then it gets pushed back to the client afterwards so then you can get the
best of both worlds because that way usually the processing of the udf happens also next to your data and not at your own computer okay so that's about all from that and just wanted to mention
at the very end of the presentation some other things that i work on so one thing that i'm involved with is the geoscoping course where you can also learn more about the basics of doing geocomputation and processing spatial data using both r and python
and the entire course is online so you can follow it and the cool tutorials about raster vector analysis in both python and r and also a bit about bash and google earth engine and yeah i will be more involved with the course as well
because now i'll be lecturing for the master of the information science in our group and i'm also involved with the vfast package maintenance a bit because yeah i use this quite a lot so i find some bugs every now and again or some things that could be improved and now we're trying to
get a new version of that running also because mars is apple's version it is there on github but it has never been pushed to cram and we would like to have that because it would be nice if people had the ability to also use these improvements in speed but it also requires
to run c++ which is much higher requirements that we have currently so we need to split it off to an additional optional package that the original package interface as well there's a lot of things that still need to be done but afterwards hopefully we'll have
something that is user friendly and easy for people to download and then finally i'm also involved with another project sensico which is a project that has several parts and one of them is directly related to time series analysis so there people can learn from one another from the community of those who are working on time series analysis and to exchange ideas about
what's the best way of doing these sort of things and also about this field of fluorescence which is a newly emerging field and it's based on hyperspectral measurements that can be done
both from ground and also from drones and airplanes and eventually in 2022 there's also going to be satellites flex built specifically for this task that we will also be able to see the fluorescence measurements from space which should allow us to do better accounting of
the productivity of for the synthesis over really large areas which we couldn't do before and that's something i'm also involved with that's all right so that was the general overview of
this and so if you have any more questions for this part then let me know and then afterwards we will go into the practical part the practical part you can find here at the very bottom and also find this link in the calendar yes
yes so the question was whether it would be a good idea to use an autoregressive approach
instead of the first like algorithm there's certainly a lot of other algorithms that can also do something like that i tested something like super simple by just taking the mean of each year for the conversation next and then doing a t-test and then seeing whether that
is any different from between different years and that's also a simple method that is definitely much faster than what bfast can do but it's also not as accurate so it's always a bit of a trade-off um yeah and the autoregressive part is a bit complicated because it has the parameters for
it and it's not something that you can automatically determine there are ways to do that but it's yeah it's difficult to tell whether it's really valid or not and bfast was developed specifically for this detection of breaks in the time series so then it's fairly
easy to understand what it's doing and also to evaluate whether those are breaks or not yes
yes yes exactly so we do the updating of the global land cover map by running bfast on all of the pixels um so these pixels though are modus pixels and they're not the
proba pixels actually we resample the modus data to proba v 300 meter grid so it's running at 300 meters and then based on that it's applied to all the sub pixels at 100 meters so at least there's a bit less data to run on but indeed it's a slow algorithm and there's really a lot
of data that needs to handle so it does take like several weeks to run over these large areas but the way that it is implemented on spark we still have enough cores that it is not a really big issue so yeah and also it helps that it detects all the breaks in the time series
immediately so we don't need to run it for every year but we already get everything in one go and another thing is that bfast monitor for the new real-time detection it is actually also one order of magnitude faster than bfast one itself this as well
yeah yeah exactly would you have had uh that are well so these are effectively the temporal covariates right so we extract from the time
series the seasonality and the trend component and so on and in fact indeed the bfast om can make use of external regressors so even if you have say additional covariates like
maybe climate and into that you can also include that into the model as long as the time steps match so you need to make sure that you don't introduce too many na values because if any of the covariates are na then all the rest of the variables are also going to become many
but generally you can use the time series on the series that we need for same yeah exactly so that's the seasonality and then salary data will be used in the modeling of the data that we have in the response
it is implemented specifically in anything that is using bfast pp so you can see it here bfast by itself i don't think it exposes this functionality
bfast one definitely does and bfast monitor also does so anything that you use bfast pp on is something that makes use of this and that stands for bfast pre-processor so it takes your inputs and then it's decomposes that into this different components as well as allows you to use
the processors and so on
three repetitions of what so instead of having so you mean three observations yeah three observations at the same location oh yeah so
for the entire time yeah then indeed that would not really work because um yeah you have to have enough observations in the time series for bfast to be able to segment
into segments if you don't have that then it's pretty impossible because you would need to put a segment on one or a segment on another and you also cannot have segment boundaries in areas where you have an ace because it's not clear where the beginning or the end of the segment would be so only in the areas that are on the left side or the right side of the
na area can be used as the next in that case we can do to the practical it's also shown here
so the practical here is also online and this is in our markdown document if you would like
to have the actual algorithm document it is on github so just this URL is the same as the io URL just in a different format so it's github.com slash this slash bfast4au and it's index.rd you can also try to run it yourself if you like i will just go through it in the
html and if there's something that you would like to be tested then i also have the script here running locally on output okay is everyone there already i think you can also find them
down there i should probably also put it in this one this tutorial is about how to
use bfast monitor for detecting breaks in the time series and
it's quite hands-on it's using modus and it is using several packages these are packages that you will need and this function automatically downloads the installation one thing that i would like to also do because this is based on bfast monitor
but it can also be based on bfast one but bfast one is only available in the development version of the bfast package so i would suggest that you do this first so install the remotes package
and then do remotes install github bfast two slash bfast and that will give you the version of bfast which has the bfast one function otherwise when you try to run it you will run into the error that and if you install it later over the crown version then you will
need to restart r and then all of your objects will be gone and then you will need to rerun over again in this case i was trying to see if it's possible to unload the previous version
but it didn't really work maybe if i do that i mean it can take a bit of time to install the packages if you find that it's a bit too slow you can omit the leaflet package
that one is only used for a nice visualization but the others you need stop change because you foster and modus tools so the general idea of this is that we will use modus symmetry from
the modus package modus tools package it can download the data for from the modus archive in particular areas these are related to flux towers over those we have information about
things including land cover and also just the general modus surface reflections and as well as mdpi so we will be downloading over a particular test site
some notice data and then we will apply a cloud mask and then we will apply bfast monitor first on a particular pixel and then later on on the entire area
and then we will see what the result looks like and then afterwards we will look at bfast one doing the same thing and seeing what is the result in that case i can just go
shortly from top to bottom so from the very beginning to look through this we have this modus subsetting website and this is where we will be getting the data from we're working with mod 13 q1 global system day imagery at 250 meter resolution
there's several bands and then we can calculate precision this is from these bands and in this case i believe we are using mdvi yes we are using mdvi and so first things first we need to select a particular site
since we're in rock again we can select the type that's closest to ours but it's not necessary you can also just choose whichever site you like as long as there is data open dpi from it
the the subsetting website uses this pixel numbering scheme if you go on it i just showed it to you let's set some data and visualizations
and we want this mod 13 to one yes so these are the illustrations and the
that's where we have the time series here and this is the spatial representation so when you download a subset then it will download information about the area that is immediately around the spots tower
so this is the grid that it downloads from and then there's the special numbering scheme because each of these pixels has a particular index all right you can also here see the land cover
map around the site and this one is not using our land cover map it's using the modistan cover map which is at a course resolution and generally not as accurate but it's good as a visualization
so if you have the packages installed then you can start downloading the data
using this code so who is still installing packages and who has the packages finished okay good then we can uh record so we have this time series function which is
not that important what it does but generally just comports the inputs from the way that it is presented in the subsetting website into a time series that is easy to read in r
specifically it is a ts object and it's done first by making use of zoo and then extracting the time from the time array in a way that it is accessible and then it aggregates the data into decimal year time stamps so that we have a regular time series rather than just the most
observations and then it's converted back into a ts object and then that's what's returned because the vfos package generally works with ts objects we're trying to implement also using the
uh zoo objects for irregular time series but that will still take some working and then this is where you download the data so you select which product you want you select what is the site id so if you wanted a different site id then
this one then you can enter a different one then which band you want in this case is in bvi you can also try for example with bvi and then from where you want to start you can also specify the end but otherwise we'll just download it from now in this case it's using the entire modus archive and then a specific site name from the
site ad and then these are just some things whether you should um show the progress and whether the download should be internal um hopefully it will work for you to download it sometimes lp back
is down for maintenance surprisingly often and then you need to download the file manual but uh did you get it to succeed already and then we have in the modus tools package also this very handy
function it's been added only recently and teach raster so it will convert the downloaded data which is actually just a big csv into an actual raster and then we can use all the raster commands on it and then the first thing that we want to do is to mask out the clouds
so that's why we have both of these so we have the registration next raster and we have the quality cluster and then we just mask out everything where the reliability flag is set to um less than zero or more than one so effectively we are keeping only the
zero and one everything else is set to any so we use the mask function and raster to mask it out and then we plot the image and this is what you should see just like the different side you will see slightly
different but probably will look similar with how it's messed up and all right this actually shows that we have pixels that have reliability one and zero we don't actually have any other ones so in this case we don't
get to filter anything out but in other cases you might get that and then if you use the click function that's quite handy you can check what are the values at particular pixels so after you have
something plotted then you can use the click function click on the particular area and then it will give you the entire time series as well as the x and y coordinates and the cell number if you also have that's true
and if you want to see a fancy map then you can also run this part with leaflets and then it will show you the raster yourself that you dealt with but this is optional so
not absolutely necessary the important part is here and that's the vfast monitor so vfast monitor itself only knows anything about time series and it doesn't know anything about rasters so we will just be using the raster package to
apply the same algorithm on every pixel in our raster so first of all we just try on a single pixel we just select one pixel the pixel is 78 it could be anything different just out of the number of pixels that we have
and we use the function time series to convert the input raster values into this ts object so it needs to have the pixel time series itself as a vector by default if you do just like this on a raster you get a matrix of one column
which is weird okay so you can just blur that into lecture again and then you also have the dates and the names of your of your layer so we just need to parse that and as they will parse it starts with an x and then the year dot month of day
okay and then that just gets put into this ts object and if you plot it you should be able to see it and then after we have the time series we will try
to detect the breaks in this time series so we can directly run vfast monitor which is a function the first argument is the time series and then the formula so what kind of output from vfastpp do you want to use
to predict what should be in future so here we're using trend and harmonics with order three and then an important thing is to say what is the starting time of the monitoring in this case we put the starting time as the beginning of 2019
yes so the harmonics is what you see here the harmon cost and the harmon sign so the way that it works is that we're
trying to fit sine and cosine waves on top of our seasonality and then that can have a different frequency and different interval and that's the harmonic order so if we have order one then we would only have
these costs one and psi one so these ones this one and that one so they can capture the seasonality that happens during the entire year so there's one peak during the year and everything else is lower if we have a harmonic order of two then
we can capture two things per year but perhaps there's a double season maybe there's some crops going on somewhere so then we can capture that seasonality as well and then the third order can capture even more seasonality even the local changes during the
different parts of the year and so on but the more orders you have the better we can model the data but that means that we are risking overfitting and if we're overfitting that means that we will get a lot more breaks than we should because it will find that any deviations from this
normal overfitted part is considered to be a break and then there's also the seasonal dummies so here they're shown as numbers but they're actually not numbers they're categorical variables and the seasonal dummies tell you
which season it is so if you have four seasonal dummies for example then you can interpret that as whether it is summer spring awesome or winter so for each part of your time series you will have just a label indicating the season that it has
so if you want to use the seasonal dummies then instead of using paramon you would use what season there was a bug in the main line version of difas actually yeah with the bug was
actually maris apple codes because he optimized it a bit and the seasonal dummies were always interpreted as numbers and not as factors and that messed up a bit i fixed this issue now and then with the latest development version you can use the seasonal dummies
and you can also which is a new thing that i implemented into the development version of efos you can set how many dummies you want so it's a parameter called s bins for example if you said s bins equals four then you will have the four seasons if you set it to more you will have more seasonality so it can
model even smaller changes during the seasons yes so when we run this we should get something like this and then you might be wondering how to interpret this so we have the black part which is our original data
then we have the area that is well first we actually have this red line that is the time of i mean the black line the black dashed line that's the start of the monitoring period but this
is what we defined as from which part do we want to start monitoring so that we would know after this part whether there has been any break or not so that's just something that we defined in the function itself then we have so the way that the algorithm works is it tries to detect what was the stable history
so from this line it goes backwards and it tries to detect breaks first that would be behind it and here it determined that the stable history goes to yes sometime in 2012 or so so it will take this stable history as the basis of what it expects to also see in the future
and it's actually just ignores this whole part altogether so you can only look at this part then based on that part it will do the bfastpp thing so it will determine what is the seasonality what is the trend the harmonics and so on
and then this is plotted as the circle data model in blue and then after it has this model here until the start of the monitoring period it will extrapolate this model
to the future and then the way it works is it tries to see whether the real data in the future after this point is sufficiently similar to what it was before so obviously it also knows how much of
a deviation the points had in the past so if we have similar deviations in the future that's also fine that means that nothing has really changed but if we have deviations that are different then you can have a break and in this case it did detect a break because compared to our predictions we have a lot of values that are much
lower and they're much lower than also in the history period and the detected break here is actually the first instance at which there was enough evidence that there is a break so it's also using the cumulative sum of deviations
so after some time it will determine that there was a break or not but that does not mean that that's the biggest break or that's the most common break in the time series which is the first time that the threshold has been exceeded and we know that there is a break and the topic also says when the break has
been detected but again the timing is not exactly precise just to get an idea that there was a break detected here yes and then what if we want to do this
for all pixels well we need to make use of some raster functionality so first we get the dates well obviously any dates are the same for all the pixels so we just get a lecture of dates and then we define a function that we want to run on every pixel so it's using the same thing we have
vfast monitoring there we have the same parameters we need to also do the time share part so that we get the input ts object and then we also need to return something that is only numbers
because actually the vfast monitor function returns a complex object and includes all kinds of interesting things but that's not something that you can easily put into a raster room because that's only kind of values in this case is returning the breakpoint timing and the breakpoint magnitude and
then afterwards we use the calc function and say on our switch station index we run the fm raster and then afterwards just to make it a bit nicer we give it some nice names and then we plot the result then you should get something like this
so how to interpret this is that here we have the timing of the break and it shows at which point in time the thresholds for whether there's a break or not has been exceeded so the earlier it is then the quicker the threshold has to become exceeded and the magnitude of change will tell you
how different it is so how much deviation there was from the modeled seasonality and trends compared to the real seasonality in trends and that it was reserved so in some cases you might see that there's bigger or
fewer ones and even places that generally don't have a break effect you see that those are na values they still have a magnitude of change because we still do detect that there was some sort of change between the model data and the real data so it's an indication of how stable that pixel is
and then we can just run it for different pixels if you use the quick function on your original registration index you get the time series and then
you can get the time series from these particular locations as well so you can get a bit of a better idea of why the algorithm decided that there is or is not a break if we want to get all the information just like we had here then we still need to rerun it for a particular pixel
because otherwise this data is just not saved into the last yeah so that's about it from the sides so most of the way there are there any questions at this point
you want you have to see there
yes so the line of the break is the red dashed line and in this case it is
pointing towards not the peak but it is pointing towards one of the points that are lower but indeed as i mentioned this is not necessarily very precise because it's just telling what is the first
point at which the threshold for whether it's a break or not has been exceeded so frankly the peaks themselves they added a lot to this cumulative sum but not enough to consider the break by itself and then the point afterwards was when it tipped over and then decides that there was a break
so i can say yes so that means that the location of the red line shows at which point there was already enough evidence that
there was a break so there was enough change before the red line to consider this to be a break yes
so if you so you mean that's whether you can get the model parameters for the predicted model also in the past or the well yeah in the
future because it's similar yeah yes so you can reconstruct this model that is shown in blue by looking at the bfos monitor object and in the object it includes the
coefficients so the coefficients are related to the bfos key so you know what to multiply your inputs by so that you could reconstruct this entire model and then using the same coefficients you can extrapolate it in any times that you like i'm not sure for bfos on is implemented but at
least for or for bfos monitor but for bfos on there is simply a function called coefficients and it just returns you the coefficients of each of the parameters that we had
which are the parts of the bfos there was also a question and zoom if we set the break to another time will the output be different and if yes how do we explain it uh
yes the output will be different for two reasons if say we set the time to a previous time for monitoring then the stable history will be calculated back from this previous time and it might be different so for
example if we said here then the stable history will no longer be this area but rather the stable history will be this area because it was increasing at the time and then it will try to see whether there have been any breaks since here if we extrapolate this model from here and then it will definitely see that yes there was a break and there was a
break over here already because this was going up and now it's going down um and then you will definitely get a different result it's also a correct result because yeah we can see that there was a break sometimes it might also give spurious breaks and that's also where
parameterization kicks in and that's one of the things that i was doing to try and figure out what would be the best parameterization for generic global and copper detection and generally what i found from that is that simpler parameters are better in fact the best one for bfos monitor
that i could find is only using the trend and actually disabling this automatic detection of history as you can see it stops actually quite early um and if we just disable and say use the entire history it will learn from the entire time series usually there's not so many breaks that it would be bad for estimation of the
time series no so what was the best was to take the time series the entire time series and not rely on this automatic
detection of when there is a stable history we just take the whole thing and consider the whole thing to be stable then if we have a trend only throughout the entire time series then we can check that if there was really a big decrease or really big increase then you can detect that change and if there was just some
random noise then it will not detect that there was a change because if you look for example in this case there isn't really that big of a reason to detect a change here actually because there were in the history some big spikes like this as well
and probably here there would be a break but that's maybe one of the only breaks that you can climb many times
and then for the next part um we can do the same thing but using vfas one and it's also just as straightforward so as long as you have the development version of the vfas package installed you will have the vfas function and then it works
pretty much the same way as vfas monitor you just get the time series as a ts object you give the formula same formula for vfas bp and same order parameters and then in the end it gives you information and says whether it found any break
points or not in this case it didn't find any break points and this was actually from the previous example so this one so it says that yeah according to bac there's actually no breakpoint throughout the entire time series if we try some other pixels you
might find one that actually does show that there was some change so in this case if we use the pixel 92 it says that there was a breakpoint you can also see at the top here it says optimal one segment partition so what this means is that the optimal
way to partition the time series is a single segment which indicates no break in this case it says it's optimal two segment partition that means that the optimal way to partition your time series is into two segments and obviously there is a break in between those two segments so there is one day and then it also returns all the breaks that it has detected
in this case one break an observation number 327 and it is at 0.8 of the entire time series at the moment it's not as user-friendly as it could be because it doesn't actually tell you the dates at which it was detected
so you need to figure it out by yourself because it depends on the input so we need to do some time manipulation because if we just take this observation number 327 and we try to say that it's the date number 327 that
would be wrong because we might have NA values and those NA values are removed and this number corresponds to the time series with NA values already removed so then we need to remove those values from our original dates which we do with NA omits and then
that is correctly corresponding to it then we see that the break that was detected was in 2016 and then we can plug that manual as well we plot the original time series and we add a line indicating where the change was and here it shows that there was an increase
and at this point it stopped increasing and it went straight this is a very basic plot because for bfos1 we don't yet have a function that we've plotted especially because of this issue that we don't have the dates directly
from the object and i will try to implement this plotting sometime in the future because as long as we can cache these dates then we can do the plotting as well now it's not yet implemented
so in this case bfos1 will be able to detect all the breaks in time series if there's more than one it will also detect that one so you can also try to for example create our raster just like we had with
bfos monitor to see spatially where there are breaks but with bfos1 it's a bit more complicated because it can detect more than one break so if you try to do that and you have two breaks for example then raster will fail because it will say
oh there's three layers and you are doing two layers before what's going on in that case you can just try to take for example the first break or the last break time series and then you can get some output like i said in my case when i did this for the project i just made
a yearly layer and then since my age was set to one year that means that i could not possibly have more than one break in a year which means that there was no problem with that so for every year i would just say whether there was a break or there wasn't in this case you can just do
something simpler if you want to try it out
yeah so that's about it so now you can just try to run this try different parameters different areas
and see what it does and how it works at the end of this tutorial there's also a bit of information about this seasonality monitoring using harmonics so exactly how to derive those signs and flow signs
so this is basically what the bfast bp function does for you so you don't have to do that yourself that can be quite more simple
and then this is an overview of the new things in the development version of bfast so i implemented this lwc selection um this is actually not in bfast but it's in struct change and the struct change package is what provides the core for the bfast package so for being able to use a lot of these
you actually have to also install the development version of struct change so it's also bfast 2 struct change um yeah and then you can use lwc to select the breaks instead of the default vac and you can
also plot the results from the breakpoints function that shows you all the different statistics including lwc and vac so basically this plot at the bottom
and you get also an indication of r squared so how well does your model fit your data and you also get the break magnitudes in different ways and finally this esbin's parameter which allows you to customize the
number of season events okay i have a question from zoom
whether there is sample code of the new things about bfast 1 it's still currently under development so i will definitely try
to make some tutorials or so for the new things that i added so the bfast 1 function itself for example we have the help function and then the help function you can have examples and there i also have an example of how
to run bfast 1 but there isn't really like a blog post or anything yet for how to do these things there's a good idea to write it down but i don't yet have the time to actually do so it's something that is already i think
an issue on the github as well that's two slash bfast and that's the github repository for the development version of bfast and if
you find any issues then feel free to open a new issue and then we will look into that including if something is wrong with the output or anything like that if you get some weird errors if you're trying to do something and we have some documentation and
things to add like to add some star instructions or to write a tutorial about how to use these new additions so hopefully we will have that by the time that we can update it on cron but so far it's under development i hope zoom can
still hear the audio yes and it's so for example if you
set the criterion in the new version the development version of struct change also marius apple is helping to maintain the struct change
development version tag as well we have the ability to also use breaks equals and then you set it to for example lwz or you set it to rss and then it will optimize it for this
criteria so if you set rss then usually it detects an incredibly high number of bricks almost as many as it can and if you use lwz it will detect much fewer breaks and also already in the current version of struct change we can set breaks equals and then an
arbitrary number and then it will just tell you how the breaks would be allocated if there were indeed two breaks or three breaks or so on depending on the number that you pass just breaks equals
yes
so the question was whether you can use modus tools to get
the information about an arbitrary location so here we have empty subsets i'm guessing that at least this function doesn't allow you to do that
sort of band oh it does have that line actually and this is in kilometer sample from left right so i assume that you would be able to get it for
that's yeah because site id just overrides the lat long so you don't need to specify that but if you want you can specify just your left and then it should give you the information yes now of what
which one do you mean which one do you mean
because yeah most of the words should be there do you mean this one or yeah so yes i can
but well it's up to parse itself so um i also have some nice vignettes yes
so i just yeah i can send it into the chat of it's a good point that's if you want
to find any of the links that i mentioned
and you don't immediately see them just feel free to ask me on natural mouse and i will just post them
yeah yeah so the question was yeah how does the model determine what is the stable history period so the history period is actually
determined well maybe i can also show it here it's meaningful um there are some options for that so here you have history and it says roc bp and all so i mentioned this history equals
all that means don't even try to detect what is the stable history and just consider all the history to be a stable bp actually means run bfaston and then determine what is the stable history based on because one
and roc means to do reversed order cumulative sum so it's just reversing the time series and trying to do the change detection the same way that it's doing with um in because monitor and if it detects that there's a break
the same way that there's the cumulative sum that is exceeded a particular value then it will say that that's a break and then that's the end of the stable history period and then reverse it back again and also you can have some different parameters for that so for roc there is this level um option
and actually this level specifically here corresponds to both the level for the history determination so does the p value at which it determines that there is a break or not and for example if you want to have fewer breaks detected then
you can set the level to be smaller like 0.005 and it will also same will affect the history that means that it will be more lenient as to what it's considered to be a stable history and the cram version of bfast this is just
one number in the development version of bfast this has become a vector of numbers where you can specify the p value level both for the detection of the break in the monitoring period and in the stable
history separately