Forecasting the future with EarthPT

Cite

Related Material

EuroPython

Smith, Mike

Formal Metadata

Title

Forecasting the future with EarthPT

Title of Series

EuroPython 2024

Number of Parts

131

Author

Smith, Mike

Contributors

N. N. (Moderation)

License

CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this

Identifiers

10.5446/69444 (DOI)

Publisher

EuroPython

Release Date

2024

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

We introduce EarthPT -- an open source Earth Observation (EO) pretrained transformer written in Python and PyTorch. EarthPT is a 700 million parameter decoding transformer foundation model trained in an autoregressive self-supervised manner and developed specifically with EO use-cases in mind. EarthPT is trained on time series derived from satellite imagery, and can accurately predict future pixel-level surface reflectances across the 400-2300 nm range well into the future. For example, forecasts of the evolution of the Normalised Difference Vegetation Index (NDVI) have a typical error of approximately 0.05 (over a natural range of -1 - 1) at the pixel level over a five month test set horizon, out-performing simple phase-folded models based on historical averaging. We also demonstrate that embeddings learnt by EarthPT hold semantically meaningful information and could be exploited for downstream tasks such as highly granular, dynamic land use classification, crop yield, and drought prediction. Excitingly, we note that the abundance of EO data provides us with -- in theory -- quadrillions of training tokens. Therefore, if we assume that EarthPT follows neural scaling laws akin to those derived for Large Language Models (LLMs), there is currently no data-imposed limit to scaling EarthPT and other similar ‘Large Observation Models.’ EarthPT is released under the MIT licence here: GitHub: aspiaspace/EarthPT.