SpectralIndices.jl: Streamlining spectral indices access and computation for Earth system research
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 156 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/68537 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
FOSS4G Europe 2024 Tartu116 / 156
6
33
35
53
55
59
61
67
70
87
97
99
102
103
104
105
107
111
121
122
123
124
125
126
127
128
134
144
150
151
155
00:00
SatelliteMusical ensembleMathematicsSpektrum <Mathematik>Green's functionRevision controlElectronic mailing listEmulationPrice indexSatelliteExpressionSpectrum (functional analysis)Slide rule1 (number)Musical ensembleCategory of beingCombinational logicLatent heatReflection (mathematics)Programming languageSurfaceMatter waveNeuroinformatikPresentation of a groupDifferent (Kate Ryan album)Sound effectState observerMathematicsMultiplicationNormal (geometry)Cartesian coordinate systemProblemorientierte ProgrammierspracheString (computer science)Open sourceSheaf (mathematics)LaptopMultiplication signInformationWell-formed formulaDatabaseModal logicAlphabet (computer science)Virtual machineAxiom of choiceComputer fileNumbering schemeElectronic mailing listSource codePositional notationWebsitePower (physics)Line (geometry)Spektrum <Mathematik>Lecture/ConferenceComputer animationMeeting/Interview
04:59
Price indexSoftware suiteSpektrum <Mathematik>Function (mathematics)CodeFatou-MengeCore dumpMachine learningPredictionTask (computing)ApproximationData modelComputer networkState of matterDirection (geometry)Type theoryComputational physicsInformationAdditionCore dumpPrice indexFunctional programmingInteractive televisionSoftwareData structureParallel portSeries (mathematics)PixelEndliche ModelltheorieMathematical optimizationDifferent (Kate Ryan album)Range (statistics)Interface (computing)Software maintenanceCodeWell-formed formulaCompilation albumMereologySpectrum (functional analysis)String (computer science)Library (computing)Array data structureResultantFigurate numberEquivalence relationState of matterNamespaceMusical ensembleSlide ruleDefault (computer science)PredictabilityMultiplication signoutputDemosceneBlack boxMachine learningGoodness of fitFatou-Menge2 (number)Structural loadSoftware developerTerm (mathematics)Time seriesRun time (program lifecycle phase)NetzwerkdatenbanksystemLinear regressionArtificial neural networkSuite (music)CASE <Informatik>Function (mathematics)Cartesian coordinate systemSet (mathematics)Web 2.0Virtual machineDimensional analysisThermal expansionKernel (computing)EntropiecodierungScaling (geometry)NeuroinformatikPropagatorPattern languageTask (computing)Pairwise comparisonLocal ringComputer fileCapillary actionSpacetimeInstance (computer science)Logical constantPresentation of a groupOverhead (computing)Acousto-optic modulatorProgramming languageSoftware suiteWave packetHypercubeParameter (computer programming)Computer animationMeeting/InterviewLecture/ConferenceProgram flowchart
14:35
Direction (geometry)Type theorySpektrum <Mathematik>Computational physicsInformationSoftware suiteImplementationParallel portNeuroinformatikBefehlsprozessorData typeList of unsolved problems in mathematicsMultiplication signSoftware engineeringOpen sourceCore dumpLibrary (computing)Term (mathematics)Graphics processing unitObject-oriented programmingData conversionProgramming languageComputer programmingSupercomputerFunctional programmingCASE <Informatik>Type theoryKernel (computing)Equivalence relationDemosceneVector potentialSemiconductor memoryBitBuildingDirection (geometry)Price indexPresentation of a groupSoftwareData managementFatou-MengeDistribution (mathematics)TrailMoment (mathematics)MereologyLevel (video gaming)Sinc functionNumbering schemeProjective planeSoftware developerResultantSlide rulePoint (geometry)ArmPairwise comparisonUsabilityAxiom of choiceLatent heatOnline helpArray data structureMusical ensembleScaling (geometry)Right angleContent (media)Computer animationMeeting/InterviewLecture/Conference
24:11
Computer-assisted translationLeast squaresMultiplication signLecture/ConferenceComputer animation
Transcript: English(auto-generated)
00:01
Thank you very much for your introduction and also for coming to this presentation about spectral indices in GL or how to access and compute spectral indices in Julia. Before starting and going into all the details about the package itself, I want to have a brief introduction into what are spectral indices. And broadly speaking, we can define spectral indices as mathematical combinations of
00:21
reflectance values from different wavelengths or bands obtained from satellite or aerial imagery. Through the presentation I will use wavelengths and bands kind of interchangeably, so they've been the same things more or less in the scope of this presentation. The value of having these mathematical combinations of different bands is in the
00:43
fact that they can underline or even hide some specific effects of land properties. We'll go more into the specific definitions. When we talk about reflectance values or
01:02
reflectance factors, we can see here that every surface material has some reflectance properties. This can be observed over different wavelengths of the color spectrum. On the left side you can see what is the observable color spectrum.
01:21
Generally we can see more than just the green, red, blue lines, but also near-infrared and shortwave infrared with certain satellites or other optical sensors. Through these reflectance values we can obtain some mathematical expressions that are as easy as the ones showcased on the slide now.
01:45
It's a very famous one, probably the most famous as far as vegetation properties go. This is an NDVI or normalized different vegetation index. As you can see, it's obtained from the near-infrared and the red band. It's a simple mathematical combination of the bands we explored earlier. There is a growing explosion of the number of spectral indices available because, as you can see, they are easily obtainable.
02:09
There are different missions going up with different bands that are being recombined into new novel indices. Also, all indices are being restructured into novel ones. For example, I just talked about the NDVI.
02:24
There has been also modification on it called the K-NDVI, just kernelizing the properties of the NDVI to see better properties of vegetation. Of course, since these are so easy to obtain through mathematical notation, people are creating new ones with already available bands.
02:43
So, with this explosion of the number of spectral indices, it's necessary or useful to have some collections of all of them to easily access and investigate what to do or what should you use for your specific application. Some collections do exist in the wild, so to speak. These are just a few of them.
03:05
For example, the index database or the SRE indices gallery and the NDVI alphabetical list of spectral indices. These are, of course, very good, but all have also some drawbacks. For example, not being open source or not providing machine readable packages to go with it.
03:23
A solution to this was presented in 2022. It is the awesome spectral indices collection, a collection that was started by David Monteiro. That's actually here in the audience. He will give a presentation after me. This provides an open source and machine readable collection of spectral indices.
03:44
That's also on GitHub, so anyone can collaborate on it. More importantly, we can have a JSON file in which we can actually iterate and read through with any programming language of your choice. Just to go quickly to it, I'm using the NDVI as an example again.
04:03
There is the... You can see that there is the application domain and which bands do you need to use for the computation. Then the formula is expressed as a string that can be parsed and evaluated with any programming language.
04:23
There are also some other useful information. I will not go through all of them, of course, but the one that was most useful to me is the reference. Since writing the paper for this conference, I used the reference section multiple times. It was relatively easy to just open a Jupyter notebook and having the reference there was really handy.
04:40
It saved quite some time because googling the name of the indices actually doesn't get you the paper most of the time. Especially for the NDVI, the paper is from 74. Just googling NDVI or Normalized Defined Vegetation Index doesn't bring up the paper that you want to cite, so it's pretty handy to have everything at your fingertips. As I said before, the whole suite of awesome spectral indices encompasses a broad range of packages that are connected to the collection.
05:09
In this case, there is a Python interface, a Google Earth Engine interface, and the one that I am going to talk about today is the Julia interface. All of them provide more or less the same functionalities, but in different languages,
05:21
and all of them, of course, have also some subtle differences in which I will try to go into more detail in this presentation. To go into the internals of spectral indices, here is a brief overview of what's going on behind the scenes, so to speak, or what's going on offline. When you, for example, want to add a new index or when the new index gets added onto awesome spectral indices,
05:45
this is what I do as the maintainer of the package. I get the indices from GitHub for the awesome spectral indices, so the original collection. I get the indices as a JSON file. They are saved locally into the data folder,
06:01
and from there, I actually call a function that has to read and parse through all the formula strings that you saw before and create native Julia functions. I went through the motivation of this yesterday during the workshop, but to make it shorter here, then having native Julia functions provides zero overhead instead of evaluating them at runtime.
06:26
Once all the functions are created, they are actually part of the code base, and then they are saved into the code base, and they can be used when actually calling the library. So, whenever you want to add an index in your local workflow,
06:43
you just have to add the index to the JSON file, and there are functions to do this, and then create the index function to add it to the functions of the library, and then you can use it and leverage all the infrastructure of the software itself.
07:00
Now, at runtime, what's happening is that when you call the software into your Julia REPL or your Julia in the interactive VS Code instance, a couple of things happen. The constants, the bands, and the indices are collections of the structures that you want to investigate or use for your explorations.
07:21
They are loaded into the namespace immediately, so you can actually access them and explore them in an interactive way. In addition to this, something that's not shown here, all the structure of the indices are themselves called into the namespace to have an easier and faster access to them.
07:42
To go into the last slide of the internals of spectral indices, this is the core dependencies and the weak dependencies of the software. Julia allows you to have weak dependencies, which means that the only core dependencies are some light packages that are also present in the base Julia software.
08:05
So, they are web maintained, and when you call spectral indices.jl, you just load those by default. Now, of course, probably if you're using spectral indices, you're not just working with arrays or native Julia data structures. You want to use some more data structures to control your data width,
08:22
and these other data structures can be added to external packages that are also loaded into the spectral indices.jl when you call them. So, these are heavy dependencies. For example, yxarrays, I know probably not people here are familiar with Julia, but yxarrays is kind of the equivalent of xarrays in Python.
08:41
So, as you can imagine, it's a heavy dependency, and it's not called automatically in spectral indices unless you want to use it yourself and you call it as well. And this helps because Julia is kind of different from Python in which it's just in time compiled. So, it has a really fast execution on one hand,
09:00
but on the other, you're paying at the start of the run time because there is some pre-compilation going on. So, when you call your software, there is some pre-compiled, the famous for bad reasons, pre-compiled time in which the package pre-loads some functions for you to then after use and run in a faster way. So, not having actually heavy dependencies in the code
09:22
helps you in the fact that it doesn't pre-compile for a long time when you don't want it to. So, now this is just the behind the scenes of the software, but how can you actually use it and how does it help in a general workflow? And here I have a workflow that I use for my own research,
09:43
for example, in machine learning. And what I was interested in, just to have a quick experiment here, is to predict the next step of a time series of 16 different vegetation indices. And to do this, I'm using some ecosystem networks models that are going to explain what they are in a second through the ReservoirComputing.jl package.
10:01
That's another package that contributed in developing. And so, we took one time series, so we took just one pixel and from this pixel we got the bands and we decided to try and see if we can do the prediction of the next step for this time series and using three different approaches.
10:22
But before going to the approaches, I want to have a quick, a really quick introduction to what ecosystem networks are because probably to even the people familiar with machine learning these are kind of an unknown obscure model. It can be described in a simple way by defining it as an expansion of the input data
10:42
into a higher dimension over which you can train it against your desired output just by linear regression. So, to the people familiar with kernel methods, this is like a kernel method but with a kernel trick implicit. For the people familiar with machine learning, with deep learning, sorry, this is like a recurrent neural network
11:00
but trained without propagation. So, simply saving the hidden state and doing regression on that. And for people that are not familiar with machine learning in general, this is just kind of a black box in which magic happens and it can predict the next step if you want it to. So, the three approaches are as follows. This is the first approach that actually makes use of spectral indices
11:24
in which we give the necessary bands to the model and in this case the necessary bands were six or seven. We give the necessary bands to the model and we train it to output the bands and once we have the bands given by the model, we feed them to spectral indices.jl to obtain all 16 indices.
11:44
The next approach is to give one vegetation at a time of which we want to predict the next step and obtain this vegetation index at the end and we do this in series, not in parallel for all the 16 indices. And when doing this in series and not in parallel because this is mimicking a larger set of the workflow
12:04
in which you are doing this for every pixel. So you do this in parallel for every pixel so you have to do it in series for the pixel itself. And the third approach is like, well, we have the 16 indices we just feed the 16 indices to the model to obtain the next step prediction. So, we have trained all these three different models
12:21
we've done some hyperparameter optimization to obtain the results that were more or less close to one another and these are the results that we obtained. The figure is kind of loaded but I'll try to walk step by step In quadrant A we can see the compute time for all three approaches. So, as you can see, the approach one
12:41
so the one that actually uses the bands to train the model and then from the bands gets the spectral indices out is the fastest by far. And the second approach which trains only on one spectral index one vegetation index, I should say at the time in series is the second fastest
13:01
and the third one which uses all 16 vegetation indices to obtain the next step of all 16 vegetation indices is actually the slowest. And on column B we can see the results in terms of time series for all the 16 vegetation indices. They are kind of overlapping since next step prediction is a relatively easy task for these models
13:21
so they're kind of overlapping and you don't see all three of the colors but you can see that the predictions are kind of good for all three approaches. And on C you can actually see the prediction accuracy of all the three approaches and there are no discernible differences between them
13:41
so for each spectral index actually there are differences but there doesn't seem to be a pattern for which one approach is worse or better than the other. So, for more or less comparable prediction accuracies we get that the first approach so the approach that leverages the spectral indices.jl package
14:00
computes the fastest. And in these applications, in general machine learning applications a time save of almost 4 seconds when done in large scales can amount to hours compute time and of money. So, this is just a quick example of how using spectral indices in a general workflow
14:22
can help make it faster or make it easier. So, to conclude, to get to go to conclusions this package provides an easy and fast way to access and compute spectral indices and using this package not only improves but also provides considerable speedups over existing workflows
14:44
and in the future directions I showed in the internal slide that I have support at the moment for dataframes.jl so the Julia equivalent of pandas and yxarrays.jl so the Julia equivalent of xarrays so the future directions are to increase the data type support
15:03
since in Julia the yxarray ecosystem is kind of fragmented so we don't have just yxarrays we have a bunch of other packages that are just as good but with different focuses so my goal for the near future at least is to improve on it and add all kind of type support
15:21
that I can add and I believe that's all from me and thank you for your attention. Thank you very much, Francesco that was a really, really interesting presentation
15:43
you actually have lots of more time is there some question or is there something you would want to maybe highlight again like with your computational approach? Something that I would highlight is simply that
16:04
the easy approach that it gives to simply compute the indices so it's a relatively straightforward package in the sense that it provides you with more or less two main functions one computes the indices and the other one computes the kernels for some kernel-based indices but the potential behind the scenes is quite expressive
16:24
and you can build on top of it quite easily we had the workshop yesterday I see some people that were there as well and we showed how many ways you can actually compute the indices that can be built on and actually expanded onto different packages
16:41
so I would say that it's the flexibility that it provides to build on top of Cool, thanks Okay, audience, questions?
17:01
Thank you Thank you Francesco for this presentation, very interesting I didn't catch, if you want to expand a little bit on the advantages of Julia over Python in this particular case and when would you recommend it
17:22
the type of use cases that you would recommend to use Julia? Thank you for the question it's always a topic of contention, right, that Python versus Julia so in this case, as I showed here there is a package also in Python
17:41
and the package that I built in Julia kind of takes a lot from it the question about using Julia instead of Python depends a lot on your workflow so when I came to Julia it was more so because it provided an easy solution for mathematical problems and that was what I was most looking for
18:01
Julia and Python differs in terms of philosophy of implementation so whereas Python is object oriented it provides all the object oriented solutions Julia has multiple dispatch so it's a different, for some things, more flexible approach to computation and again, as I said before, Python is interpreted
18:21
and Julia is just in time compiled so these are things that have pros and cons so the pro, I would say, for Julia to be just in time compiled is that it is faster so I would have loved to showcase some speed comparison I chose not to do them
18:42
because at some points the speed comparison is just native Julia versus, for example, Numbay or it wasn't a fair comparison of the libraries of Spectralinist.jl and Spindex it was more a clash of languages I would say that if you care about speed and you are not afraid of actually doing some of the hard implementations yourself
19:06
Julia is the place to go if you want something that actually works immediately and which we have a lot more examples of then Python would be your choice Thank you More questions?
19:22
I would like to pick up on it a little bit When you do really large workflows especially if we think in Are there also packages that help you to scale the computer up
19:43
that you use more computers or more data than you have memory, for example? Yes, so the one of the packages on which I'm building on top of that's YxArrays, for example, allows you to use data larger than memory and there is also DimensionalData.jl
20:02
which is the base of YxArrays.jl that also allows you to do that and rasters.jl, just to say a couple of them on top of my mind When you want to actually scale computation there is Distributed.jl that's also a part of the core library of Julia that allows you also to do computations in parallel
20:21
Julia, I think, in 2017 or 2018 was the second or third language to achieve exascale computation with the Celeste project It's actually one of the three languages that have run the most compute so to speak on supercomputers with Fortran and C++, if I remember correctly
20:41
It's up there with the big guns that have been there for quite some time Of course, it's a new language I've been around for like now 11 or 12 years something like this As far as programming languages, it's quite young, I would say but the ecosystems are growing and again, there are Python equivalents that allow you to do all the computations
21:02
you would do in Python, you can do it in Julia so you can do outside of memory data computation or parallel computation, both on CPUs and GPUs Thank you Thank you More questions? So, you do program this yourself?
21:23
Yeah, so Let's have a conversation Do you do that for specific research topics that you are into or do you feel more to be a research software engineer that is more focused on making the software
21:44
and workflows more reproducible or more scalable to support, let's say, soil scientists or agricultural scientists I mean, those indices are, of course, widely usable Where would you position yourself? That is a nice question, actually
22:01
I wouldn't know Of course, all the software I'm developing I showed here the other package which I'm the main developer of All the software I develop, I do mainly for research purposes and then it gets kind of out of hand My supervisors are not here so I can say this freely It gets out of hand in which I enjoy programming
22:20
at the beginning a bit more than the research questions so I get sidetracked a lot on the programming and I enjoy actually having results of my coding out there so being open source and usable by others so I would say at the moment it's still half and half I develop them to answer my research questions but then I also develop them because I like doing it
22:44
It's a tough question because I really don't know which side I'm standing on Thank you Who of you in the audience considers themselves more scientific or more developed? Let's say, who is really doing science, may I ask?
23:03
Which is still good, I mean, after all we're at a software conference So most of you know that there's probably another bunch of you that are really into developing, really into coding Oh, it's this small number again
23:21
Then we probably have, I don't know, high level people or managers Desktop software users That's not a judgement Arms up Who are the people who couldn't raise their arms yet? Why don't you feel included?
23:43
This is actually the academic track that actually really makes sense There was just recently in Finland the research software engineering conference This is a new thing, actually it's not that new but I think it's still rising Research software engineer And obviously we have to acknowledge that for many scientific problems
24:03
we need more effective or more scalable computing So I applaud you that you went into that and still you do the science itself So that's pretty cool So let's thank Francesco, we made good use of the time Thank you very much