We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

SpectralIndices.jl: Streamlining spectral indices access and computation for Earth system research

00:00

Formal Metadata

Title
SpectralIndices.jl: Streamlining spectral indices access and computation for Earth system research
Title of Series
Number of Parts
156
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Remote sensing has evolved into a fundamental tool in environmental science, helping scientists monitor environmental changes, assess vegetation health, and manage natural resources. As Earth observation (EO) data products have become increasingly available, a large number of spectral indices have been developed to highlight specific surface features and phenomena observed across diverse application domains, including vegetation, water, urban areas, and snow cover. Examples of such indices include the normalized difference vegetation index (NDVI) (Rouse et al., 1974), used to assess vegetation states, and the normalized difference water index (NDWI) (McFeeters, 1996), used to delineate and monitor water bodies. The constantly increasing number of spectral indices, driven by factors such as the enhancement of existing indices, parameters optimization, and the introduction of new satellite missions with novel spectral bands, has necessitated the development of comprehensive catalogs. One such effort is the Awesome Spectral Indices (ASI) suite (Montero et al., 2023), which provides a curated machine-readable catalog of spectral indices for multiple application domains. Additionally, the ASI suite includes not only a Python library for querying and computing these indices but also an interface for the Google Earth Engine JavaScript application programming interface, thereby accommodating a wide range of users and applications. Despite these valuable resources, there is an emerging necessity for a dedicated library tailored to Julia, a programming language renowned for its high-performance computing capabilities (Bezanson et al., 2017). Julia has not only established itself as an effective tool for numerical and computational tasks but also offers the possibility to utilize Python within its environment through interoperability features. This interoperation adds a layer of flexibility, allowing users to access Python's extensive libraries and frameworks directly from Julia. However, while multiple packages are available in Julia to manipulate high dimensional EO data, most of them provide different interfaces. Furthermore, leveraging Python's PyCall for interfacing with Zarr files and other high-dimensional data formats is not practical. Specifically, the inefficiency in cross-language data exchange and the overhead from cross-language calls significantly hinder performance, underlining the need for native Julia solutions optimized for such data tasks. Recognizing the need for a streamlined approach to use spectral indices, we introduce SpectralIndices.jl, a Julia package developed to simplify the computation of spectral indices in remote sensing applications. SpectralIndices.jl provides a user-friendly, efficient solution for both beginners and researchers in the field of remote sensing. SpectralIndices.jl offers several features supporting remote sensing tasks: - Easy Access to Spectral Indices: The package provides instant access to a comprehensive range of spectral indices from the ASI catalog, removing the need for manual searches or custom implementations. Users can effortlessly select and compute indices suitable for their specific research needs. - High-Performance Computing: Built on Julia's strengths in numerical computation, SpectralIndices.jl provides rapid processing even for large datasets (Bouchet-Valat et al., 2023). Consequently, this makes it a time-efficient tool for handling extensive remote sensing data. - Versatile Data Compatibility: SpectralIndices.jl supports a growing list of input data types. Furthermore, the addition of data types to the library does not slow down compilation through the built-in package extensions of Julia that allow conditional compilation of dependencies. - User-Friendly Interface: Designed with simplicity in mind, the package enables users to compute spectral indices with just a few lines of code. This ease of use lowers the barrier to entry for those new to programming or remote sensing. - Customization and Community Contribution: Users can extend the package's capabilities by adding new indices or modifying existing ones. This openness aligns with the FAIR principles, ensuring that data is findable, accessible, interoperable and reusable. By providing a straightforward and efficient means to compute spectral indices, the package helps users to streamline and accelerate software pipelines in Earth system research. Furthermore, it provides a consistent and unified interface to compute indices, improving the reliability and accuracy of research outcomes. Whether tracking deforestation, studying crop health, or assessing water quality, SpectralIndices.jl equips users with the tools needed for accurate, timely analysis. The introduction of SpectralIndices.jl reflects a broader trend in scientific computing towards adopting high-performance languages like Julia, highlighting the importance of efficient data analysis tools in addressing complex environmental challenges. This development contributes to the democratization of data analysis, making advanced tools more accessible to a diverse range of users.
Keywords
127
SatelliteMusical ensembleMathematicsSpektrum <Mathematik>Green's functionRevision controlElectronic mailing listEmulationPrice indexSatelliteExpressionSpectrum (functional analysis)Slide rule1 (number)Musical ensembleCategory of beingCombinational logicLatent heatReflection (mathematics)Programming languageSurfaceMatter waveNeuroinformatikPresentation of a groupDifferent (Kate Ryan album)Sound effectState observerMathematicsMultiplicationNormal (geometry)Cartesian coordinate systemProblemorientierte ProgrammierspracheString (computer science)Open sourceSheaf (mathematics)LaptopMultiplication signInformationWell-formed formulaDatabaseModal logicAlphabet (computer science)Virtual machineAxiom of choiceComputer fileNumbering schemeElectronic mailing listSource codePositional notationWebsitePower (physics)Line (geometry)Spektrum <Mathematik>Lecture/ConferenceComputer animationMeeting/Interview
Price indexSoftware suiteSpektrum <Mathematik>Function (mathematics)CodeFatou-MengeCore dumpMachine learningPredictionTask (computing)ApproximationData modelComputer networkState of matterDirection (geometry)Type theoryComputational physicsInformationAdditionCore dumpPrice indexFunctional programmingInteractive televisionSoftwareData structureParallel portSeries (mathematics)PixelEndliche ModelltheorieMathematical optimizationDifferent (Kate Ryan album)Range (statistics)Interface (computing)Software maintenanceCodeWell-formed formulaCompilation albumMereologySpectrum (functional analysis)String (computer science)Library (computing)Array data structureResultantFigurate numberEquivalence relationState of matterNamespaceMusical ensembleSlide ruleDefault (computer science)PredictabilityMultiplication signoutputDemosceneBlack boxMachine learningGoodness of fitFatou-Menge2 (number)Structural loadSoftware developerTerm (mathematics)Time seriesRun time (program lifecycle phase)NetzwerkdatenbanksystemLinear regressionArtificial neural networkSuite (music)CASE <Informatik>Function (mathematics)Cartesian coordinate systemSet (mathematics)Web 2.0Virtual machineDimensional analysisThermal expansionKernel (computing)EntropiecodierungScaling (geometry)NeuroinformatikPropagatorPattern languageTask (computing)Pairwise comparisonLocal ringComputer fileCapillary actionSpacetimeInstance (computer science)Logical constantPresentation of a groupOverhead (computing)Acousto-optic modulatorProgramming languageSoftware suiteWave packetHypercubeParameter (computer programming)Computer animationMeeting/InterviewLecture/ConferenceProgram flowchart
Direction (geometry)Type theorySpektrum <Mathematik>Computational physicsInformationSoftware suiteImplementationParallel portNeuroinformatikBefehlsprozessorData typeList of unsolved problems in mathematicsMultiplication signSoftware engineeringOpen sourceCore dumpLibrary (computing)Term (mathematics)Graphics processing unitObject-oriented programmingData conversionProgramming languageComputer programmingSupercomputerFunctional programmingCASE <Informatik>Type theoryKernel (computing)Equivalence relationDemosceneVector potentialSemiconductor memoryBitBuildingDirection (geometry)Price indexPresentation of a groupSoftwareData managementFatou-MengeDistribution (mathematics)TrailMoment (mathematics)MereologyLevel (video gaming)Sinc functionNumbering schemeProjective planeSoftware developerResultantSlide rulePoint (geometry)ArmPairwise comparisonUsabilityAxiom of choiceLatent heatOnline helpArray data structureMusical ensembleScaling (geometry)Right angleContent (media)Computer animationMeeting/InterviewLecture/Conference
Computer-assisted translationLeast squaresMultiplication signLecture/ConferenceComputer animation
Transcript: English(auto-generated)
Thank you very much for your introduction and also for coming to this presentation about spectral indices in GL or how to access and compute spectral indices in Julia. Before starting and going into all the details about the package itself, I want to have a brief introduction into what are spectral indices. And broadly speaking, we can define spectral indices as mathematical combinations of
reflectance values from different wavelengths or bands obtained from satellite or aerial imagery. Through the presentation I will use wavelengths and bands kind of interchangeably, so they've been the same things more or less in the scope of this presentation. The value of having these mathematical combinations of different bands is in the
fact that they can underline or even hide some specific effects of land properties. We'll go more into the specific definitions. When we talk about reflectance values or
reflectance factors, we can see here that every surface material has some reflectance properties. This can be observed over different wavelengths of the color spectrum. On the left side you can see what is the observable color spectrum.
Generally we can see more than just the green, red, blue lines, but also near-infrared and shortwave infrared with certain satellites or other optical sensors. Through these reflectance values we can obtain some mathematical expressions that are as easy as the ones showcased on the slide now.
It's a very famous one, probably the most famous as far as vegetation properties go. This is an NDVI or normalized different vegetation index. As you can see, it's obtained from the near-infrared and the red band. It's a simple mathematical combination of the bands we explored earlier. There is a growing explosion of the number of spectral indices available because, as you can see, they are easily obtainable.
There are different missions going up with different bands that are being recombined into new novel indices. Also, all indices are being restructured into novel ones. For example, I just talked about the NDVI.
There has been also modification on it called the K-NDVI, just kernelizing the properties of the NDVI to see better properties of vegetation. Of course, since these are so easy to obtain through mathematical notation, people are creating new ones with already available bands.
So, with this explosion of the number of spectral indices, it's necessary or useful to have some collections of all of them to easily access and investigate what to do or what should you use for your specific application. Some collections do exist in the wild, so to speak. These are just a few of them.
For example, the index database or the SRE indices gallery and the NDVI alphabetical list of spectral indices. These are, of course, very good, but all have also some drawbacks. For example, not being open source or not providing machine readable packages to go with it.
A solution to this was presented in 2022. It is the awesome spectral indices collection, a collection that was started by David Monteiro. That's actually here in the audience. He will give a presentation after me. This provides an open source and machine readable collection of spectral indices.
That's also on GitHub, so anyone can collaborate on it. More importantly, we can have a JSON file in which we can actually iterate and read through with any programming language of your choice. Just to go quickly to it, I'm using the NDVI as an example again.
There is the... You can see that there is the application domain and which bands do you need to use for the computation. Then the formula is expressed as a string that can be parsed and evaluated with any programming language.
There are also some other useful information. I will not go through all of them, of course, but the one that was most useful to me is the reference. Since writing the paper for this conference, I used the reference section multiple times. It was relatively easy to just open a Jupyter notebook and having the reference there was really handy.
It saved quite some time because googling the name of the indices actually doesn't get you the paper most of the time. Especially for the NDVI, the paper is from 74. Just googling NDVI or Normalized Defined Vegetation Index doesn't bring up the paper that you want to cite, so it's pretty handy to have everything at your fingertips. As I said before, the whole suite of awesome spectral indices encompasses a broad range of packages that are connected to the collection.
In this case, there is a Python interface, a Google Earth Engine interface, and the one that I am going to talk about today is the Julia interface. All of them provide more or less the same functionalities, but in different languages,
and all of them, of course, have also some subtle differences in which I will try to go into more detail in this presentation. To go into the internals of spectral indices, here is a brief overview of what's going on behind the scenes, so to speak, or what's going on offline. When you, for example, want to add a new index or when the new index gets added onto awesome spectral indices,
this is what I do as the maintainer of the package. I get the indices from GitHub for the awesome spectral indices, so the original collection. I get the indices as a JSON file. They are saved locally into the data folder,
and from there, I actually call a function that has to read and parse through all the formula strings that you saw before and create native Julia functions. I went through the motivation of this yesterday during the workshop, but to make it shorter here, then having native Julia functions provides zero overhead instead of evaluating them at runtime.
Once all the functions are created, they are actually part of the code base, and then they are saved into the code base, and they can be used when actually calling the library. So, whenever you want to add an index in your local workflow,
you just have to add the index to the JSON file, and there are functions to do this, and then create the index function to add it to the functions of the library, and then you can use it and leverage all the infrastructure of the software itself.
Now, at runtime, what's happening is that when you call the software into your Julia REPL or your Julia in the interactive VS Code instance, a couple of things happen. The constants, the bands, and the indices are collections of the structures that you want to investigate or use for your explorations.
They are loaded into the namespace immediately, so you can actually access them and explore them in an interactive way. In addition to this, something that's not shown here, all the structure of the indices are themselves called into the namespace to have an easier and faster access to them.
To go into the last slide of the internals of spectral indices, this is the core dependencies and the weak dependencies of the software. Julia allows you to have weak dependencies, which means that the only core dependencies are some light packages that are also present in the base Julia software.
So, they are web maintained, and when you call spectral indices.jl, you just load those by default. Now, of course, probably if you're using spectral indices, you're not just working with arrays or native Julia data structures. You want to use some more data structures to control your data width,
and these other data structures can be added to external packages that are also loaded into the spectral indices.jl when you call them. So, these are heavy dependencies. For example, yxarrays, I know probably not people here are familiar with Julia, but yxarrays is kind of the equivalent of xarrays in Python.
So, as you can imagine, it's a heavy dependency, and it's not called automatically in spectral indices unless you want to use it yourself and you call it as well. And this helps because Julia is kind of different from Python in which it's just in time compiled. So, it has a really fast execution on one hand,
but on the other, you're paying at the start of the run time because there is some pre-compilation going on. So, when you call your software, there is some pre-compiled, the famous for bad reasons, pre-compiled time in which the package pre-loads some functions for you to then after use and run in a faster way. So, not having actually heavy dependencies in the code
helps you in the fact that it doesn't pre-compile for a long time when you don't want it to. So, now this is just the behind the scenes of the software, but how can you actually use it and how does it help in a general workflow? And here I have a workflow that I use for my own research,
for example, in machine learning. And what I was interested in, just to have a quick experiment here, is to predict the next step of a time series of 16 different vegetation indices. And to do this, I'm using some ecosystem networks models that are going to explain what they are in a second through the ReservoirComputing.jl package.
That's another package that contributed in developing. And so, we took one time series, so we took just one pixel and from this pixel we got the bands and we decided to try and see if we can do the prediction of the next step for this time series and using three different approaches.
But before going to the approaches, I want to have a quick, a really quick introduction to what ecosystem networks are because probably to even the people familiar with machine learning these are kind of an unknown obscure model. It can be described in a simple way by defining it as an expansion of the input data
into a higher dimension over which you can train it against your desired output just by linear regression. So, to the people familiar with kernel methods, this is like a kernel method but with a kernel trick implicit. For the people familiar with machine learning, with deep learning, sorry, this is like a recurrent neural network
but trained without propagation. So, simply saving the hidden state and doing regression on that. And for people that are not familiar with machine learning in general, this is just kind of a black box in which magic happens and it can predict the next step if you want it to. So, the three approaches are as follows. This is the first approach that actually makes use of spectral indices
in which we give the necessary bands to the model and in this case the necessary bands were six or seven. We give the necessary bands to the model and we train it to output the bands and once we have the bands given by the model, we feed them to spectral indices.jl to obtain all 16 indices.
The next approach is to give one vegetation at a time of which we want to predict the next step and obtain this vegetation index at the end and we do this in series, not in parallel for all the 16 indices. And when doing this in series and not in parallel because this is mimicking a larger set of the workflow
in which you are doing this for every pixel. So you do this in parallel for every pixel so you have to do it in series for the pixel itself. And the third approach is like, well, we have the 16 indices we just feed the 16 indices to the model to obtain the next step prediction. So, we have trained all these three different models
we've done some hyperparameter optimization to obtain the results that were more or less close to one another and these are the results that we obtained. The figure is kind of loaded but I'll try to walk step by step In quadrant A we can see the compute time for all three approaches. So, as you can see, the approach one
so the one that actually uses the bands to train the model and then from the bands gets the spectral indices out is the fastest by far. And the second approach which trains only on one spectral index one vegetation index, I should say at the time in series is the second fastest
and the third one which uses all 16 vegetation indices to obtain the next step of all 16 vegetation indices is actually the slowest. And on column B we can see the results in terms of time series for all the 16 vegetation indices. They are kind of overlapping since next step prediction is a relatively easy task for these models
so they're kind of overlapping and you don't see all three of the colors but you can see that the predictions are kind of good for all three approaches. And on C you can actually see the prediction accuracy of all the three approaches and there are no discernible differences between them
so for each spectral index actually there are differences but there doesn't seem to be a pattern for which one approach is worse or better than the other. So, for more or less comparable prediction accuracies we get that the first approach so the approach that leverages the spectral indices.jl package
computes the fastest. And in these applications, in general machine learning applications a time save of almost 4 seconds when done in large scales can amount to hours compute time and of money. So, this is just a quick example of how using spectral indices in a general workflow
can help make it faster or make it easier. So, to conclude, to get to go to conclusions this package provides an easy and fast way to access and compute spectral indices and using this package not only improves but also provides considerable speedups over existing workflows
and in the future directions I showed in the internal slide that I have support at the moment for dataframes.jl so the Julia equivalent of pandas and yxarrays.jl so the Julia equivalent of xarrays so the future directions are to increase the data type support
since in Julia the yxarray ecosystem is kind of fragmented so we don't have just yxarrays we have a bunch of other packages that are just as good but with different focuses so my goal for the near future at least is to improve on it and add all kind of type support
that I can add and I believe that's all from me and thank you for your attention. Thank you very much, Francesco that was a really, really interesting presentation
you actually have lots of more time is there some question or is there something you would want to maybe highlight again like with your computational approach? Something that I would highlight is simply that
the easy approach that it gives to simply compute the indices so it's a relatively straightforward package in the sense that it provides you with more or less two main functions one computes the indices and the other one computes the kernels for some kernel-based indices but the potential behind the scenes is quite expressive
and you can build on top of it quite easily we had the workshop yesterday I see some people that were there as well and we showed how many ways you can actually compute the indices that can be built on and actually expanded onto different packages
so I would say that it's the flexibility that it provides to build on top of Cool, thanks Okay, audience, questions?
Thank you Thank you Francesco for this presentation, very interesting I didn't catch, if you want to expand a little bit on the advantages of Julia over Python in this particular case and when would you recommend it
the type of use cases that you would recommend to use Julia? Thank you for the question it's always a topic of contention, right, that Python versus Julia so in this case, as I showed here there is a package also in Python
and the package that I built in Julia kind of takes a lot from it the question about using Julia instead of Python depends a lot on your workflow so when I came to Julia it was more so because it provided an easy solution for mathematical problems and that was what I was most looking for
Julia and Python differs in terms of philosophy of implementation so whereas Python is object oriented it provides all the object oriented solutions Julia has multiple dispatch so it's a different, for some things, more flexible approach to computation and again, as I said before, Python is interpreted
and Julia is just in time compiled so these are things that have pros and cons so the pro, I would say, for Julia to be just in time compiled is that it is faster so I would have loved to showcase some speed comparison I chose not to do them
because at some points the speed comparison is just native Julia versus, for example, Numbay or it wasn't a fair comparison of the libraries of Spectralinist.jl and Spindex it was more a clash of languages I would say that if you care about speed and you are not afraid of actually doing some of the hard implementations yourself
Julia is the place to go if you want something that actually works immediately and which we have a lot more examples of then Python would be your choice Thank you More questions?
I would like to pick up on it a little bit When you do really large workflows especially if we think in Are there also packages that help you to scale the computer up
that you use more computers or more data than you have memory, for example? Yes, so the one of the packages on which I'm building on top of that's YxArrays, for example, allows you to use data larger than memory and there is also DimensionalData.jl
which is the base of YxArrays.jl that also allows you to do that and rasters.jl, just to say a couple of them on top of my mind When you want to actually scale computation there is Distributed.jl that's also a part of the core library of Julia that allows you also to do computations in parallel
Julia, I think, in 2017 or 2018 was the second or third language to achieve exascale computation with the Celeste project It's actually one of the three languages that have run the most compute so to speak on supercomputers with Fortran and C++, if I remember correctly
It's up there with the big guns that have been there for quite some time Of course, it's a new language I've been around for like now 11 or 12 years something like this As far as programming languages, it's quite young, I would say but the ecosystems are growing and again, there are Python equivalents that allow you to do all the computations
you would do in Python, you can do it in Julia so you can do outside of memory data computation or parallel computation, both on CPUs and GPUs Thank you Thank you More questions? So, you do program this yourself?
Yeah, so Let's have a conversation Do you do that for specific research topics that you are into or do you feel more to be a research software engineer that is more focused on making the software
and workflows more reproducible or more scalable to support, let's say, soil scientists or agricultural scientists I mean, those indices are, of course, widely usable Where would you position yourself? That is a nice question, actually
I wouldn't know Of course, all the software I'm developing I showed here the other package which I'm the main developer of All the software I develop, I do mainly for research purposes and then it gets kind of out of hand My supervisors are not here so I can say this freely It gets out of hand in which I enjoy programming
at the beginning a bit more than the research questions so I get sidetracked a lot on the programming and I enjoy actually having results of my coding out there so being open source and usable by others so I would say at the moment it's still half and half I develop them to answer my research questions but then I also develop them because I like doing it
It's a tough question because I really don't know which side I'm standing on Thank you Who of you in the audience considers themselves more scientific or more developed? Let's say, who is really doing science, may I ask?
Which is still good, I mean, after all we're at a software conference So most of you know that there's probably another bunch of you that are really into developing, really into coding Oh, it's this small number again
Then we probably have, I don't know, high level people or managers Desktop software users That's not a judgement Arms up Who are the people who couldn't raise their arms yet? Why don't you feel included?
This is actually the academic track that actually really makes sense There was just recently in Finland the research software engineering conference This is a new thing, actually it's not that new but I think it's still rising Research software engineer And obviously we have to acknowledge that for many scientific problems
we need more effective or more scalable computing So I applaud you that you went into that and still you do the science itself So that's pretty cool So let's thank Francesco, we made good use of the time Thank you very much