We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Forecasting the future with EarthPT

00:00

Formal Metadata

Title
Forecasting the future with EarthPT
Title of Series
Number of Parts
131
Author
Contributors
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
We introduce EarthPT -- an open source Earth Observation (EO) pretrained transformer written in Python and PyTorch. EarthPT is a 700 million parameter decoding transformer foundation model trained in an autoregressive self-supervised manner and developed specifically with EO use-cases in mind. EarthPT is trained on time series derived from satellite imagery, and can accurately predict future pixel-level surface reflectances across the 400-2300 nm range well into the future. For example, forecasts of the evolution of the Normalised Difference Vegetation Index (NDVI) have a typical error of approximately 0.05 (over a natural range of -1 - 1) at the pixel level over a five month test set horizon, out-performing simple phase-folded models based on historical averaging. We also demonstrate that embeddings learnt by EarthPT hold semantically meaningful information and could be exploited for downstream tasks such as highly granular, dynamic land use classification, crop yield, and drought prediction. Excitingly, we note that the abundance of EO data provides us with -- in theory -- quadrillions of training tokens. Therefore, if we assume that EarthPT follows neural scaling laws akin to those derived for Large Language Models (LLMs), there is currently no data-imposed limit to scaling EarthPT and other similar ‘Large Observation Models.’ EarthPT is released under the MIT licence here: GitHub: aspiaspace/EarthPT.
Structural loadSequenceWordFormal languageMathematical modelPredictionState observerMathematical modelTransformation (genetics)SpacetimeOpen sourceExterior algebraLinear regressionToken ringBinary multiplierMatrix (mathematics)Artificial neural networkTime seriesSatelliteComputer animation
Category of beingWordStructural loadMathematical modelMessage passingMathematical modelMathematicsMedical imagingComputer animation
Mathematical modelMathematical analysisFormal languageMathematical optimizationNichtkommutative Jordan-AlgebraLaurent seriesEquals signFingerprintDemonRingnetzParameter (computer programming)Right angleSet (mathematics)PlotterArtificial neural networkMathematical modelWordToken ringMathematical model2 (number)Computer animation
Arithmetic meanRight anglePhysicalismPlotterMathematical modelMathematicsGoodness of fitComputer animation
Parameter (computer programming)Term (mathematics)Entropie <Informationstheorie>Graph (mathematics)Parameter (computer programming)Mathematical modelSlide ruleReduction of orderNumberNichtlineares GleichungssystemEntropie <Informationstheorie>Physical lawTerm (mathematics)Token ringSet (mathematics)Computer animation
Structural loadSet (mathematics)Cartesian coordinate systemMathematical modelPoint (geometry)Formal languageArtificial neural networkInternetworkingGreatest elementMaxima and minimaInsertion lossMathematical modelComputer animation
Stack (abstract data type)State diagramWindowView (database)Data conversionSurfaceFile archiverInternetworkingSurface WebComputer animation
Token ringAxonometric projectionSet (mathematics)Token ringPatch (Unix)Musical ensembleCodeMultiple RegressionState observerMultiplication signMathematical modelInternetworkingSatelliteAlgorithmRight anglePixelSpacetimeComputer animation
Successive over-relaxationMathematical modelIntrusion detection systemGrass (card game)Game theoryBimodal distributionLogic gatePoint (geometry)RoboticsSet (mathematics)Online chatSequenceData structureToken ringStructural loadMathematical modelOperator (mathematics)ArmDifferent (Kate Ryan album)CubeTime seriesMultiple RegressionMathematical modelComputer animation
Axonometric projectionToken ringSeries (mathematics)Design of experimentsHTTP cookieMusical ensembleVarianceMathematical modelCyberspaceState observerDiagramMathematical modelSatelliteNear-ringMathematical modelBitPatch (Unix)Similarity (geometry)Time seriesAlgorithmComputer animationLecture/Conference
SatelliteConservation lawSeries (mathematics)PredictionPatch (Unix)Water vaporWordMultiplication signSubject indexingDifferent (Kate Ryan album)PlotterMathematical modelTime seriesSoftware testingSequenceComputer animation
Graph coloringCategory of beingData structureTime seriesFunction (mathematics)Artificial neural networkEinbettung <Mathematik>SpacetimeProjective planeSequenceMultiplication signComputer animation
Token ringMathematical modelWindowMathematical modelFlow separationNatural languageMathematical modelParameter (computer programming)Structural loadArtificial neural networkComputer animation
Design of experimentsCondition numberPrice indexMathematical modelEvent horizonTime seriesSequenceSurfaceGreen's functionLine (geometry)Pattern languagePredictionPlotterComputer animation
Moving averageSeries (mathematics)Repository (publishing)Asynchronous Transfer ModeLink (knot theory)Open sourceFile archiverRepository (publishing)Data miningMathematical modelProjective planeProcess (computing)Slide ruleBitCASE <Informatik>Shared memoryComputer animation
Token ringAxonometric projectionConservation lawPatch (Unix)Bayesian networkGraph (mathematics)Mathematical modelMedical imagingSpiralSequencePatch (Unix)Time seriesArchaeological field surveyPoint (geometry)Computer animation
Magneto-optical driveState observerInformationCartesian coordinate systemMathematical modelSoftware testingPatch (Unix)Population densityParameter (computer programming)Computer animation
WindowVotingHD DVDSpiralParameter (computer programming)Mathematical modelMassDistanceGraph coloringCategory of beingMusical ensembleDifferent (Kate Ryan album)State observerMultiplication signSmoothingComputer animation
SpiralReal numberCategory of beingFormal languageMathematical modelSequenceGraph coloringAstrophysicsComputer animationDiagram
CodePrinciple of maximum entropyOpen sourceOpen sourceMultiplication signSlide ruleComputer clusterTime seriesMathematical modelComputer animation
Ocean currentMathematical modelMathematical modelPerfect groupState observerModal logicDrill commandsComputer animation
Successive over-relaxationGraphics processing unitSlide ruleLibrary (computing)Patch (Unix)Mathematical modelProjective planeUniverse (mathematics)NeuroinformatikData compressionInformationDrill commandsTunisSet (mathematics)Cross-correlationField (computer science)Standard deviationRight angleLecture/Conference
Ordinary differential equationConservation lawRange (statistics)Mathematical modelConservation of energySpiralTrigonometric functionsVotingHD DVDProfil (magazine)Noise (electronics)Type theorySocial classDifferent (Kate Ryan album)SphereSpacetimeComputer animationDiagram
Router (computing)TelecommunicationNumberRight angleTwitter1 (number)2 (number)Multiplication signPoint (geometry)Mathematical modelSound effectMathematicsMathematical modelSoftware testingState observerTime seriesLecture/Conference
Graph (mathematics)PlotterComputer animationLecture/Conference
PredictionCondition numberWindowBookmark (World Wide Web)View (database)Price indexPredictionDisk read-and-write headCASE <Informatik>Mathematical modelMultiplication signResultantSingle-precision floating-point formatAreaObservational studyCanonical ensemblePoint (geometry)SpacetimeComputer animation
Software testingExtreme programmingMathematical modelEvent horizonPower (physics)Distribution (mathematics)Set (mathematics)Lecture/Conference
PredictionCondition numberPrice indexDivergenceLine (geometry)State of matterBlock (periodic table)Computer animation
Proof theoryEvent horizonPosition operatorMathematical modelScaling (geometry)Logic gateObject (grammar)MereologyPredictabilitySatelliteDifferent (Kate Ryan album)Sound effectInternet der DingeFlow separationCodierung <Programmierung>Point (geometry)MappingDirection (geometry)Line (geometry)BitTime seriesType theoryAlgorithmMoistureComputer clusterOpen sourceMultiplication signLecture/ConferenceComputer animation
Transcript: English(auto-generated)
Hi, everyone. So yeah, I'm Mike. I'm an astronomer by training, but I'm also doing a lot of remote sensing now in a startup company called Aspia Space. And I'm going to be talking about how we can use large observation models, not large language models, but large observation models to predict future satellite
observations. So just a meme to begin. I'm sure you will feel like this. There's a lot of talk about LLMs going around, but this isn't going to be an LLM talk. It's going to be an adjacent talk to LLMs, so about large observation models and how we can use alternate data sources
to train these large models. So we trained a model, a GPT model, which is an also-aggressive transformer model. And all this is is a big neural network, and it's trying to predict the next item in a sequence. So this is what a GPT 3, 4, 1, 2, a chat GPT
is under the hood. It's a lot of decoding transformer layers, which is a neural network layer. It's a big matrix multiplier. And it takes in a sequence of tokens or words or time series or anything else, and it tries to predict the next item in the sequence.
So you can see here we have a sentence about the galaxy with five spiral arms. And the model is just trying to predict the next word in a sentence by training on a huge amount of data, loads and loads of text. And if you do this, it turns out you get some very cool emergent properties in the model as well.
So if you trained on enough text and enough imagery, the model learns just by learning the next word in a sentence. It learns stuff about zoology. So you can see at the top, this is from the Flamingo model, which is a model that came out of Google DeepMind a couple of years ago. And it's just trained auto-aggressively to predict the next word in a sentence
and load the textual data. But you can see that it actually learned what Flamingo is, where they're found. They're found in the Caribbean and South America. It learns stuff about art. So I don't know where that painting was painted, but the model knows just from learning the next word in a sentence on a large corpus of data. It can read text.
So it can take that image, read that, yeah, this word says saloons. It can do maths as well, because it needs to learn all of these abilities to be able to predict the next word in a sentence better and better, which is what its objective is. So you can get some very, very cool emergent properties
from these models just by training on lots of data. That's the take-home message here. So I'm going to be talking about two models in particular here. And I'm going somewhere with this, so bear with me for a second. So on the left-hand side, there's
a model called Chinchilla, which is a smaller model with 7 billion parameters, 7 billion neurons in a neural network. And on the right-hand side, there's a model called Gopher. And this is a much larger model than Chinchilla, billions and billions of parameters. And there will be a little thing in a second saying
how big they are. So Gopher has 280 billion parameters. This is around the size of a GPT-3 model. And it's been trained on 300 billion words or tokens of data. And Chinchilla has been trained on 1,400 billion tokens
of data or words of data. And it's only 7 billion parameters big. So it's much smaller than Gopher. And when these papers came out, when Chinchilla came out, the thinking was the bigger the model, the better it is. It doesn't really matter about the data set size.
And when Chinchilla came out, they proved this wrong. So they actually proved that you need around 20 tokens per neuron in your neural network to be able to train it optimally. So Gopher's bigger than Chinchilla. But Chinchilla's better than Gopher
because it's trained on more data. It's trained on more diverse data, more data. So it works better. And here's just a plot from the paper. You don't need to read all that. But blue is good. Blue means Chinchilla beat Gopher. And on the x-axis, you have diverse topics
like astronomy or conceptual physics right at the end there, high school mathematics. It beats Gopher on questions from all of these topics just because it's been trained on more data and it's more compressed in the model. So this is great, right? Oh, hang on. Not yet. It's not yet great. I'm going to explain what a neural-scaling law is first.
So neural-scaling law is something that relates the number of tokens in the model, how big the model is, and the number of data tokens it's been trained on, and the model performance, which is lmin there. So the parameter term is the size of the model. The data term is the amount of data it's been trained on. And the data entropy is some constant that can't be reduced.
And Hoffmann et al., the Gopher paper, the Chinchilla paper, they took this equation and they fitted it. And they got these parameters in there. And this doesn't mean much in itself. I'm going to show you a graph in the next slide that shows
you something I found quite cool. And if you look at this and you step back for a second and you look on the x-axis, that's the model size. On the y-axis, you have the data set size. And you can see here, if you add more data to the data
axis, it will move towards the bottom left corner, which is where you want to be. This is the minimum loss or the maximum performance of the model. And on the x-axis, you're not going to really get much improvement by making these neural network models bigger at this point. So you can see here our plotted Chinchilla, Palm, GPT-3, and other large language models.
But if you really want to improve these models at this point, you want to add more data. So you want to go and find more data on the internet or other corpuses and put them into these models and reduce the y-axis of this graph so that they perform better. So why don't we just go do this?
Why don't we just go and get loads more data and train these models and they get better? It turns out there's not enough data on the internet to do this. So yeah, so where else can we go? We can use synthetic data, or we can keep scraping the surface web, which
tends to be of low quality as conversations between people. It's not like archive papers or GitHub code. So Villa Lobos found in 2022, they predicted that we're going to run out of high quality internet data by 2028, so in a couple of years' time. So there's got to be other places
to get this data from, right? And it turns out, if you look at the observational sciences, there's a lot of data there that isn't being used to train these large models. So Clear Sky's an algorithm that removes cloud cover from satellite observation imagery in the visible to near infrared bands, and that's something we've been working on at Aspia Space.
And if you take that Copernicus Sentinel data, you have 140 trillion high quality Earth observation tokens you can then use to train these models. And in astronomy, too, this is to do with astronomy as well. You've got the LSST, which is a large telescope that's
going to be observing the night sky very soon. In astronomy, it's always very soon when the telescope's going to see first light, but very soon, we're going to have the LSST running. And it's going to generate 12 billion VIT tokens, which is a 16 by 16 pixel patch of the night sky, per night. And that's a lot of data, around 4.4 trillion tokens
per year, which rivaling when I wrote this a year or so ago, is rivaling the largest textual data sets. If you add it all together, it will dwarf them. So this is a lot of data that's not being used. So why don't we just use it? Why don't we put it in an autoregressive model and see what happens? And this is what we did.
So you can actually use any kind of data in a large autoregressive model. So this is the Gato paper. And what they showed here is that you can take Atari games, you can take tokens of texts, of chats,
of a robotic arm operating and picking up cubes and placing them down in different places, time series. You can just put it all into a single model. And the model doesn't care where the tokens came from. It just works. You can just put in a load of data. Say, it's got to be structured, but you can just put in the data as structured, and it
will predict the next token in a sequence, and it does it well. And at some point, the multimodal data sets, the models trained on the multimodal data sets outperformed the unimodal data sets, which I found very, very cool. So it's not just for funsies. It's actually improving the models. If you take all of this data and plug it in.
So I hope I convinced you that we can take Earth observation and astronomy data and put it in and see what happens. So I'm going to talk about two models, Earth PT, which is why you're all here, and also Astro PT in a second, which is a similar technique we apply to astronomy data.
Earth PT is a little bit more easy to visualize, so I'll start with this. So we use the clear sky algorithm to remove cloud cover from Sentinel-2 data. Sentinel-2 is just a visible to infrared RGB near infrared observation of the Earth from a satellite.
And so if the satellite passes over, you get an observation of the same patch of ground every few days or so. So then you can turn them over and have a time series of the same patch of ground and then pass this into the model. So we have an uninterrupted cloud-free time series of data we can just put into these models.
So a little diagram to get everyone's brain visualizing this. So you can have an observation on November 2, 3, 4, 5, and try and predict the observation on November 6. So that's what this model is. It's the same as a GPT, but instead of text going in, it's time series going in.
So we found that it could forecast the future, which I found very, very cool. So these are four different plots of different indices that farmers are interested in. NDVI has to do with vegetation, how verdant a patch of ground is.
WI is the amount of water on a patch of ground. BSI is the bare soil index, so how unverdant a patch of ground is. And GCVI is also another vegetation index. And you can see across all of these, it predicts. We tested it up to five months in the future,
but we don't know exactly how far it can go yet. This is for further testing, but I was very happy with this. And just to reiterate, it doesn't explicitly learn any of this stuff. We're just trying to predict the next item in a sequence, the next item in a time series, just like a GPT model is trying to predict the next word in a sentence.
The embeddings are meaningful. So if you take the outputs of the penultimate layer of the neural network and project it onto a 2D space and color them by some emergent properties of the time series, like how much vegetation is there,
how much bare soil is there, or the RGB color at the height of summer, you can see there's some structure that the neural network's learned just from predicting the next item in a sequence. And if you average these over 2023,
this shows that it's not just memorizing stuff. It's actually learning something important and relevant about the data. So we trained several Earth PT models from 10 million parameters to 700 million parameters just to see if it gets better as the model gets bigger.
It turns out it does, just like a natural language neural network. And the very interesting thing for me here is that the models still look like they're improving at the end of the training run. So there's still much, much more to squeeze from these models. We're actually scanning it up right now with a load more data of the whole UK
just to see how far we can push this, how much we can throw at these models and see how good they can get. So this is the money plot for Earth PT. So we trained a model up until July of 2022.
And we then took the trained model and said, predict the next six months in advance. And the black line is the ground truth. And the green dashed line is a prediction. And it turns out it predicted the 2022 UK drought just from seeing the historic time series and learning from this
and learning the patterns in it without us telling it what a drought is, of course. It's just learning, again, a very simple metric, the next item in a sequence. So if it can do this, what else can it do? Can it predict flooding events and other things?
This is something we're working on now. We know it can predict drought. But what else is hidden behind the surface? It's all open source, so you can check it out. There's a GitHub repository. There's an archive paper as well describing the model if you guys are interested. It's all written in Python and PyTorch.
Contributions are welcome if anyone fancies it. I'll share the slides as well just in case someone doesn't manage to get a picture of this. I'll put it on the Discord. And finally, I'm going to be talking a little bit about Astro PT, which is a passion project of mine. So this is Earth PT. This is my day job. And then Astro PT is just something I find very cool because I used to be an astronomer.
So yeah, LSST, lots of data in astronomy. Astro PT is a very similar concept. But instead of predicting the next item in a time series, we're predicting the next item of a galaxy image. In a bunch of galaxy images, we took 9 million galaxy images from DESI DR8.
This is just an astronomy survey. And we wanted to see if the model gets better. As it gets bigger, it gets more data. And again, this is just predicting. So from 0 and 1, you predict 2 in this patch, spiral sequence of patches. It's not doing anything fancy. It's trying to predict the next patch in a sequence here.
And I'm going to have some graphs point at you right now just to show that it works. Turns out it does. So x-axis is the model size. Y-axis is how good it does at predicting the next patch.
Yep. So it sees 10 billion galaxies. It improves up into around 90 million parameters, which I found interesting. I wonder why it stops there. I think it's something to do with the data information density. This is something I'm testing now. So seeing if you take some other observation of the galaxy, say spectral lines or something like this,
and see if it continues to go down, if that's a more dense data set. Again, this is the emergent property thing we saw for Earth PT, but this time it's for galaxy astronomy properties. So MGZ are the brightness of the galaxy. G minus R and R minus Z are the colors in different bands.
Redshift is the distance to the galaxy. SSFR is how many stars are born per year in this galaxy. M star is the mass of the galaxy. And then the rest of them are just parameters of how the galaxy looks, whether it's smooth, whether it's discy, whether there's some weird artifacts going on there.
And it turns out the model learns this just by predicting the next item in the spiral observations of the galaxy. It gets better as it gets bigger, too, in the emergent properties. So this is very neat.
So yeah, it's not just an academic exercise. You can actually take this and use it for real astro things to predict the brightness or the color of the galaxy or even the amount of stars born per year in this galaxy. And it learns all this just from predicting the next item in the sequence. We don't have to do anything fancy. We just throw the data at it, and it learns it just
in the emergent property, just like the language models do. This is also all open source. And this time there's a Discord where people can get involved if you guys are interested. We're trying to build it out to be super multimodal now. So this is something that's in the works.
So we're going to not just have galaxies. We're going to have star, time series, and all of this stuff going into these models. So this is going to be my final slide the next month. So I think I've left enough time for some questions. So if anyone has any questions, please let me know. Just to summarize, the current foundation models are limited
by data set size and not by model size. So we need more data. There's not enough textual data. So where do we get it from? We can get it from observational modalities, like astronomy, like Earth observation, anything like this. So thanks, everyone.
Perfect. So you know the drill. There's a microphone. Just step up and ask questions. In the meantime, let me ask a naive, maybe dumb, question. I'm not from the field, so you said it gets better with more data.
Is there some correlation? I don't know. Does more data means just the same more better? So there was a paper that just came out last month, I think. And I used GZIP as a compression metric to see how compressible the data is. And it turns out if it's less compressible, so more
information dense, you need less data to get the same performance, which makes sense, right? So it depends on the data set, which is why I don't think the Galaxy model scaled as far as we thought it would do when we started the experiments. Nice. Thank you. Then please go ahead.
I have two small questions. First one is, you mentioned it was a hobby project. Is it then not expensive to run this? I can imagine like with that much data, it's. So we needed four big GPUs to run this.
But you can take the model and fine-tune it on a local computer. We knew some people at universities that would help us out to give us some GPUs. But yeah, once it's trained, it's very cheap to deploy. So you can use all of the standards, llama libraries for this, because it's just a GPT model under the hood.
And the second one, if you go back to the slide of the Ostro, yeah, the things it had learned, like so the patches. Yeah, yeah. No, one back. OK. Just like two classes here. Ah, yes. I was wondering, are there like two types of galaxies then?
So this was a mystery when we were training this. And we were pulling our hairs out, like what's going on here? Why are there two islands? And I should have mentioned this. I forgot to mention it. And it turns out one of them is the Northern Hemisphere and one of them is the Southern Hemisphere. And the telescopes in each hemisphere have a slightly different noise profile.
So it separated them out in the latent space, which is what you're seeing here. So yeah, I was like, why is this? And it turns out it's just like, yeah. OK, thanks. Hi, first of all, thank you for your talk. That was super interesting. For the Earth observation ones, one of the trends that at least
I think we've been seeing over the past few years is that people's models of how Earth develops with climate change are generally like under-representing the changes. Do you think that this is something that your model is likely to also suffer?
Or is there some reason that it is likely to be able to deal with that better? It depends how prominent these changes are in the data set. Because at this point, the model, we're not putting any human pushing in the model. So if it's in a data set, it should pick up on it.
I'm not sure about doing some testing exactly how it would affect it. So yeah, this is something to keep in mind. Fair. And so I guess kind of a half-note question after that then is, one of the things that we're kind of worried about is this domino effect where one big thing happens that then knocks
a bunch of other big things. Presumably at that point, we get to the edges of this model if we're reaching those kind of unforeseen knock-ons. Cool. Thank you very much. Thanks. Hey, nice talk. I thought it was really interesting the use of GPTs for time-series forecasting.
I presume you compared it against more traditional time-series forecasting models and found that to be best. Is that right? We compared it against LSTM and also a simple phase-folded model per year, and it outperformed them. I'll need to dig in. It's all in the paper, but I'll need to dig in to find the actual numbers for you.
But it did do better, yeah. Cool, yeah. I'll check out the paper. Thanks. Second question, some of the charts, they had a really cool style, like a hand-drawn kind of thing. I was wondering how you did that. Yeah, I think that's pretty nifty. You just do import SKCD from matplotlib or something, and it does it all, and it looks like an SKCD graph.
Oh, yeah, I'll look into that. Thanks so much. Yeah. I have a question on the weather data when you showed the drought. The weather data? Weather data, yeah. This one, yeah. This one? The one with the drought, when you do the predictions six months afterwards.
This one, yeah. Yeah. So I'm trying to understand this. So essentially, you look at the land, at the earth, and you predict what happens in the sky, because the drought is, in a sense, the result of the rain. Yes. I wanted to ask if you have any kind of KPIs,
if you've tried this, predicting the past, and establish a kind of, let's say, more solid evidence than one case. Because I don't understand really how can one connect the two things, let's say.
So I think the, well, this is my head's cannon. So I think the model in viewing the ground, it has to learn something about the climate surrounding it, just to be able to know what the ground's going to look like at a certain point in time. So it must learn something about the historic weather
patterns of this area. And this is where I think this comes from, so it can recognize, oh, this has happened earlier in the year, this normally happens in the drought year, all of this stuff. But yeah, you're right, it does need more evidence than a single case study. This is something that we're working on now at Aspia Space. So hopefully, very soon, we'll have more data
to throw at this. Hi. Thanks for the very interesting talk. I'm just interested how well the models work in some extreme circumstances. As I would guess, if we put in enough data, it would learn from the year to year in the summer
that it is how it looks and how it looks in winter. But I don't know, in some extreme situations, like extreme rain storms in the summer, does it work? Does it not? Intuitively, if it's not in the data set, I don't think it would work. But if these extreme events are historical,
I imagine it would work. But I haven't tested it properly myself. It has the same drawbacks that GPT and text has. So in text, if you ask it something that's completely out of distribution, it won't know what's going on. Similar to this, but if it's seen it before, I would imagine it would be able to predict what's going on.
And does the predictive power, I don't know, if we would just predict for next week, maybe it would work better. Did you maybe test on that? Yeah. So, well, I guess you can see on this plot, it does much better closer to the dashed line, which is
whether the divergence of state was. So we trained it up until this dashed line and we said, predict what's next. And then it starts to diverge later on. It starts to do the average per year. So yeah, closer to the divergence state, it does better. And then further away, it starts to wobble around. Thanks. Yeah.
Following on from them questions, how useful do you think it's going to be with climate change coming into effect? Do you think it'd be more or less useful because I think it's a bit more unpredictable? I'm hoping it will be more useful. This is part of the reason we're investigating this, is to create something that can predicts
like events before they happen for farmers to be able to like, I don't know, put more nitrogen fertilizer on their soil or to anticipate a drought or something like this and then do something to prevent it. Again, if it's historical, if it's some black swan climate event,
it's probably not going to be able to predict it, but if it's historical and it's shown in the data, for example, if it's getting slightly more wild each year, it should be able to predict it. I guess the proof is in the pudding. So if it's deployed wide scale, we'll see exactly how well it does and where it falls down.
So I think it needs to be deployed to be able to find out exactly where the drawbacks are and where the positives are. Thank you. Hey, me again. Different question. Is the data that you get from this indicative of like weather, could you basically use this
as a type of weather prediction as in it's going to rain on Tuesday? Maybe not. But the model is super flexible. So if you have a bunch of weather data that's from some, I don't know, IOTs on the ground of like precipitation, it should be able to do this. And we actually tried a moisture predictor
with a GPT model and it works fine. And I think the more modern weather prediction algorithms are moving in this direction now. So they're moving into GPTs to predict what's going on a little bit down the line. I think they're using mostly my source for encoders, but yeah, yeah, it should work. You'd need the right data to throw at it.
This is secondary weather prediction because it's looking at the ground and it's like, oh, this weather predicts the ground, but like there's like several layers. And if you just predict the weather directly, if that's what you're interested in, it'll be much easier for the model to learn this. Cool, that's fair. Cheers.
Hello, I also have a new question. Do you also add other data, like topographic maps of like the Earth or is it only time series? It's at this point, any data we can throw at it. So we're working on a new model that can take in data from different satellites. It's taken in elevation as well.
IoT objects and all of this stuff. So like I was saying at the start, more data is good data. The model doesn't care where it comes from. It just does its thing, it chugs along and you can put it all into the same model like they did with Gato. And it just seems to work. Thanks.
Yeah, thank you for all the great questions. I think you're walking around here. So if you have any more, just grab him. Then please give it up again for Mike.