We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

InforSAT: an online Sentinel-2 multi-temporal analysis toolset using R CRAN

00:00

Formal Metadata

Title
InforSAT: an online Sentinel-2 multi-temporal analysis toolset using R CRAN
Title of Series
Number of Parts
351
Author
Contributors
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production Year2022

Content Metadata

Subject Area
Genre
Abstract
Remote sensing via orbiting satellite sensors is today a common tool to monitor numerous aspects related to the Earth surface and the atmosphere. The amount of data from imagery have increased tremendously since the past years, due to the increase in space missions and public and private agencies involved in this activity. A lot of these data are open-data, and academics and stakeholders in general can freely download and use it for any type of application. The bottle-neck is often not data availability anymore, but the processing resources and tools to analyse it. In particular multi-temporal analysis requires stacks of images thus digital space for storage and processing workflows that are tested and validated. Processing image by image is often not a viable approach anymore. Several solutions have been created to support centralized and automated processing of multiple images. Software as a service (SaaS) is becoming more common among users. The most popular to this day is probably Google Earth Engine (GEE), which gives users Petabytes of data at their fingertips, access to processing resources and an interface that provides a large number of tools for data processing via Javascript or Python programming environments (Gorelick et al., 2017). What took before days if not months can now be run in a few minutes or hours. GEE is available and free for academics as of today, but it must be noted that it is not to be taken for granted in the future. Other initiatives such as Copernicus RUS project that has closed at the end of 2021 also provided access to data (Copernicus data) and computing resources, to promote uptake of Copernicus data via educational and research activities. Moving towards SaaS solutions usually requires a provider that puts software on the cloud and a channel, usually a web portal, for accessing data and tools. The R CRAN programming environment has all the “ingredients” that are needed to create such SaaS in a local machine or on a server. We propose and discuss here a solution, called InforSAT, that was created ad hoc for centralizing satellite imagery processing, taking advantage of a remote server with multiple processors and thus also parallel processing solutions. The R Shiny package was used for connecting online widgets for user interaction with R tools for specific processing of imagery that is done via other specific packages. To this date only Sentinel-2 Level 2C data are considered, but the system is scalable to other sensors and processing levels. The tools that are available to this day are focused on multi-temporal analysis, to support the academic community involved in particular in vegetation analysis, whose phenology has notable changes inter- and intra-annually. The tools are available via a web portal to reach research teams that are not so familiar to satellite image analysis, to allow simplified extraction of multi-temporal data from Sentinel-2 images. Figure 1 shows the interface and figure 2 the result of extracting a boxplot of vegetation index values over a specific time window. All image data are stored in a user-defined folder on the server, and a script checks weekly (or at other user-defined intervals) for new Sentinel-2 images and automatically downloads them and stores metadata in an R list structure. The metadata stores image paths, bands and also histograms of values for each band, to use for defining color-stretching parameters during image rendering on the browser. Regarding visualization, users can render real-color and false-color composites defining their own band combinations, and can also create and raster layer with the values of common vegetation indices or define their own index by providing an equation on the interface (see Figure 1). The images to be rendered on the user browser are processed on-the-fly from the original JPEG2000 format, also for calculating the index raster and the color-composites. Each index raster is calculated every time the user re-draws actively the raster, by sampling the original image with points that correspond to the screen pixels, reprojected from screen coordinates to image coordinates. Depending on the screen size and on the area, these are around one million points, that are then converted to an image and rendered on screen with a fixed scale that depends on the expected minimum and maximum values of the index (e.g. for the normalized vegetation index that would be between -1 and 1) or a scale that automatically stretches between the 10th and 90th percentile of the frequency distribution of the real values. The color-composites are automatically drawn at any scale using the intrinsic overviews for each Sentinel-2 band that are present from the JPEG2000 format. Regarding multi-temporal analysis, users can define one ore more polygons over the area and for each polygon extract single pixel values (digital numbers – DN) and aggregated zonal statistics for each and all available images in a few seconds, with or without using parallel processing mode. Users can download the multi-temporal data, i.e. the DN values, in table format for further analysis. The table is in long format and has a column with a timestamp, one with polygon ID and one column for each band with values. In both visualization and multi-temporal analysis, users can decide a threshold for masking according to cloud and snow probability, which are available products from the sen2cor processing of Sentinel-2 to level 2C. In the near future this solution will be integrated in an R package, allowing users to easily download, install and replicate their own portal locally or in their own server.
Keywords
202
Thumbnail
1:16:05
226
242
Temporal logicSet (mathematics)Mathematical analysisMultiplicationGeomaticsForschungszentrum RossendorfIntegrated development environmentOpen sourceComponent-based software engineeringLatent heatRepresentation (politics)Series (mathematics)SurfaceStress (mechanics)Server (computing)Virtual realityComputer fileArchitectureComputer programmingProcess (computing)Vector spaceRaster graphicsInterface (computing)Software frameworkManufacturing execution systemScripting languageComputer-generated imageryData structureRootSingle-precision floating-point formatServer (computing)TesselationAreaMedical imagingComputer fileCalculationSurfaceOperator (mathematics)Library (computing)MappingScripting languageInformationLatent heatBilderkennungWeb applicationOpen sourceTime seriesPointer (computer programming)Perfect groupSatelliteMetreProcedural programmingMathematicsEvent horizonProcess (computing)Data miningComputer architecturePresentation of a groupMultiplication signDevice driverIntegrated development environmentBitSoftwareProxy serverDynamical systemProjective planeRemote procedure callStudent's t-testProduct (business)RootNetwork topologyForschungszentrum RossendorfThresholding (image processing)Data structureWeb 2.0Universe (mathematics)Computer animation
MultiplicationTemporal logicMathematical analysisPrice indexGraphics tabletInterface (computing)TesselationVisualization (computer graphics)Computer fileMedical imagingPrice indexScripting languageComputer animation
Graphics tabletZoom lensPrice indexAreaMusical ensembleLevel (video gaming)Medical imagingCalculationFunction (mathematics)PixelComputer animation
Price indexWell-formed formulaStudent's t-testComputer animation
Graphics tabletMedical imagingProcedural programmingPrice indexMetreStudent's t-testImage warpingDemosceneWeb applicationZoom lensImage resolutionLevel (video gaming)Scaling (geometry)Computer animation
Graphics tabletCombinational logicMedical imagingMusical ensemblePrice indexGraph coloringComputer animation
Graphics tabletPoint cloudPrice indexMusical ensembleTemporal logicInstallation artWeb browserComputing platformSoftwareMultiplicationMathematical analysisComplex analysisAccess BasicFunktionalanalysisSampling (statistics)Scripting languageStudent's t-testMedical imagingMathematicsPrice indexComputer fileSoftwareFeedbackStatisticsMultiplication signMathematical analysisTable (information)AreaPoint (geometry)PlotterInstallation artInformationTesselationPoint cloudRight angleStandard deviationComputer animation
MultiplicationTemporal logicMathematical analysisGraphics tabletPrice indexCartesian coordinate systemScripting languageCalculationServer (computing)Mathematical analysisWeb portalPasswordComputer fileRootMultiplication signProcess (computing)Computer animation
VideoconferencingComputer animation
Transcript: English(auto-generated)
Okay, great. Perfect. So what I'm going to talk about is this tool that we developed in our lab. We are from the University of Padua, me and some colleagues of mine from the Department of Land and Agroforestry System. It's called the Teza Department.
So we work a lot on forestry and agriculture. And also the Interdepartmental Research Center in Geomatics, Georgia. We work with spatial data. And in this case, as you probably, if you were following the presentation by Iza just five minutes before,
we work a lot with Sentinel data because it's freely available, of course, and it's very high quality and it's a good spatial, has a good spatial resolution, 10 meters. We all know Sentinel-2. So, okay.
So what's the main idea? What is this InforSat tool that we developed? Now, first of all, we all know that satellite data comes as a stack of images in bands. So it's notably not a small amount of data. So we're talking about usually one gigabyte per data set.
And you want to also work on time series data. So that already brings you the idea, okay, if I have to work in time series data, I have to collect a lot of images, not just one. So usually what you do is you download all the data you need and you put in the GIS or SNAP or other software
and you start doing your procedures on the data. Why do we want to do that? Well, in our lab, but most people, a lot of people use satellite data to analyze the surface, of course. But most importantly, when we speak about time series,
we speak about surface modifications. So we're talking about changes in the Earth's surface. So urbanization or if we monitor vegetation, we talk about vegetation stress, fires. So this fires is a bit, you see the logo. It was actually born because we are involved in a project
that's related to analyzing extreme fire events. So that's, you know, the main driver was that. But then also the growth of also green areas in the urban environment. And I put that in an asterisk because I was thinking about all the surface modifications and all these negative connotations came,
you know, urbanization, deforestation, pollution. So I said, I have to find something that's a bit more positive. So in urban cities, we know how precious it is to have green areas, parks, and because of heat mitigation, heat island mitigations, et cetera.
So that's also something that we can use satellite data for. So we, you know, I work in the city of Padova, and now there's a big project in the city of Padova that they want to plant a lot of trees. So because they want, you know, a greener city. So with satellite imagery, we can monitor and say, okay, let's see how the city will change
before and after the project, the trees that are planted. Now, so open source tools, probably you know better than me that there's a lot, a lot, a lot of open source tools that deal with imagery. We heard about our fail toolbox.
We know, I can just GDAL library. We know there's a lot of open source tools out there. So since we have to work with imagery, which is not lightweight, and it requires image stacks, our idea was to put everything, start thinking remotely.
Now, this is not a new idea, of course. So just not go to the typical download your image and everybody do their own processing of the image in their own computer, but just put the images remotely and organize them in a way that all users can already find their image available
and do not have to download the same image over and over. And of course, this is very important because as I said, surface modifications are also proxies for climate change and other things. So if we see the dynamics in surface modifications,
we can also predict what will happen in the future, what could happen in the future and also take action. So our idea says, okay, there's already some, a lot of solutions that contemplate the use of remote image, remote servers.
So there's Google Earth Engine, maybe some of you are acquainted with that, but there's also other tools that access the data directly online. But the criteria behind this is also that we wanted to create an architecture because we also have PhD students and we have master students
that start to work with satellite imagery, so they could also get acquainted with the procedure, with the workflow that is typical in satellite image analysis. And so we created with a project, we bought, let's say, this Linux, this architecture,
this hardware, basically. We got it hosted by the university in the university calculation center, where they have all the high performing computers. And it's not particularly fancy.
It's a Linux operating system, headless, so it doesn't have interface. And behind it, we decided to use R as the, how do you say, the daemon, the thing that takes care of all the processing. Why that? Because R has a very nice library called Shiny
that allows you to create web applications. So we have the server side, a lot of tools that we will talk about that take care of the image analysis on the server. We have the images on the server, and the Shiny web application allows the user online
just to access the webpage and tell the server what to do with the images via web. Via the web. So again, this is not anything new, but we wanted to use these tools and see also how much we can go, how far we can go with this. And we used also Map Server
because it has a lot of solutions that allow to interact with images, not so much for the processing of the images, but also for providing web mapping services. So we can actually use the net, the web GIS, the web app from Shiny and see the products of processing the images.
So the first step was to create a very, very easy architecture. So decide to save all the imagery in a single folder. So decide a root folder. The image is downloaded and unzipped.
Your typical Sentinel-2 image is usually compressed. And again, this is done by an R script. So the R script just says, okay, let's see what images are available. Of course, not for the whole Earth, but just for a specific area that we're interested in. But this is agnostic, so it can be actually replicated also in any area.
You just have to decide which is your area of interest. And the script automatically downloads the data if it's below a certain threshold of cloud cover and unzips the file. So you have your information in a folder, each image,
and it's already structured. If you've ever worked with Sentinel-2, you know it has this very specific structure, the folder structure. So what happens is that the script also creates a lookup table. Very simply, each file that is downloaded has a date and has, as you can see, it has a tile.
I think I should have a mouse pointer somewhere. No, well, you know, it has, as we will see also in the next slide, it has a tile, so its image is divided into specific areas, which are coded with a specific index.
So the script knows exactly the image, where it's georeferenced, and the date, because it's inside the file name itself. So what we have, this is the interface, and we have a very simple interface
where you can choose the tile you want to work on, and you can choose the date. And what actually happens, the visualization part, is that you can visualize, you can visualize an index. Why did we choose that? Because indices are band combinations,
and indices are quite easily created on the fly. So the index is not precalculated, but it's actually calculated on the fly. The user can actually provide his own calculation of the index by using very simple, we could call it very simple syntax, and then it provides the output of the index.
Now that's done quite fast, and it's done quite fast because the image is subsampled in the sense that the index is now calculated for all the pixels of the bands, but it's calculated only depending on the zoom level and on which area is being visualized.
So you can choose, again, as you can see, you can choose preloaded indices, or you can change, you can play around with your own formula. As I told you, this is also for students, so that's why we did this, not only for academic research,
but also for students to learn how to do this. And for example, this is an index that's calculated. What happens is when you zoom in, you will see that the index has a specific resolution, which is not the resolution of the image, but might be very much a rougher resolution
because if you're looking at the image from a zoom level far away, then he will actually resample the image and only calculate the index at a rougher scale. If you zoom in, you can recalculate the index, so you can actually go all the way up to the original resolution of your image.
In this case, it's 10 meters. You can also decide to, I don't know if you see the difference, there's some different resampling types, like this is bilinear, and this is another resampling type, so you can also teach students what actually happens
when you choose different resampling schemes. Maybe if some of you are familiar with these resampling schemes, you might know they come from GDAL, so what's behind here is the GDAL warp function, executable, which is called by R.
So what happens behind the scene is very simple procedures, but the procedures are actually encoded into an interactive web app. For better understanding, actually, of what your index tells you about your Earth surface, you can also use, you can also upload the colored image,
so you can actually do band combinations, and have both the index and your image interpreted as a color combination. And you can move the mouse around and actually see what is going on.
Here, the last thing I want to talk to you about is that all this was born to see, to do multi-temporal analysis. So what the user can do, he can say, okay, I want to see what happens in certain areas over time. So the user can draw an area,
and basically what happens is that the R calls all the functions that are necessary to sample those areas on all the images that have been downloaded. And as all aerial statistics do,
it pulls out the median, average, standard deviation, so you know what happens over time over that area. And this is also to show students the problem with atmospheric correction, if there's a cloud, so if you have a high standard deviation, very likely there is a cloud that's right on the area. Because yes, we download,
we try to download images with very low cloud cover, but you might always get your typical Murphy's law that you get the cloud right on your area. If you want, you can of course remove certain dates or you can just decide to not process some dates,
because maybe they are too cloudy for that area at that time. And you can analyze the plots directly online, or you can download your data in a worksheet in Excel,
in a normal spreadsheet, so if you want to do your own analysis with your own tools, you have your data over time over that area. And this is all the information, as you can see here. The table gives you both the index value, the cloud cover probability, the snow cover probability,
which is all information you find in your Sentinel-2 image, which has been processed to level two. Okay, just to get some last points. Again, this for students can be a gentle introduction to several remote sensing concepts.
So how remote sensing data, how it comes, how the images are downloaded, how the file is named, and what you actually find inside your package when you download the compressed image. They can appreciate the importance of Earth observation,
so when they see what happens after a fire, how your index changes after a fire, that actually I've seen a lot of feedback from students that are quite impressed. And also they are very interested to see how, for example, the NDVI index has certain seasonalities, so it's not always constant, but changes with seasons,
because the photosynthesis is not always constant, depends also on the temperature. And they can do that with multi-temporal analysis, so that's the first primers to foster also curiosity to students. And the nice thing is that it's all online, so they don't actually have to do,
that the first impact is a bit, as I said, is a bit gentler, and they don't have to install software and again, it's agnostic in the sense you can develop it in any area. It just depends what image you download. If you download an image, maybe in, let's say in Canada,
from Canada, the script automatically figures out from georeferencing that that image is in Canada, so you will get the tile over Canada and you can analyze the image. Plus, of course, the students learn about pan combination, calculating indices, et cetera.
Okay, so now what's the future developments? Basically, it's in GitHub, it's not a package yet. I don't know if it will ever become actually a package, because it's quite, it's basically a collection of software, so it's not intended to be ever a package,
but it's more of an idea. And, but it can be implemented in your own server, if you want, it's, we're trying to create a manual, so to see step-by-step how to, how to install it in your own server. It's quite straightforward, but it requires a couple of,
a couple of tweaks in some of the files. Basically, it just requires to know where the root, the path to the root server is, and then you have to launch for the first time the script that crawls in all the subfolders and finds the images. The web portal is online.
So far, it's, you know, there's no password, because there's not, it's only students, but if we see too many people accessing, we will maybe provide a password and ask for people to register, just because, of course, it might, we don't want to clutter the server. The other nice thing I want to mention,
I still have one minute, is that you can also do the analysis using parallelized, it's very simple, because obviously a lot of all those aerial calculations can be parallelized, because there are the same calculations over many images. So we can actually decide if we want to
use more than one cluster. We can make a cluster of multi-thread, and, or not, or maybe we just don't care, we just want not to do it and make parallel processing. I think I've done, that's it.
There is a video, but it's five minutes, and there's no time, so maybe I'll just run the video while you ask questions. So I'm open for questions.