Machine learning for soil health - Is the horizon the limit? Flaws, potentials and future challenges

Cite

Related Material

OpenGeoHub Foundation

Nussbaum, Madlene

Formal Metadata

Title

Machine learning for soil health - Is the horizon the limit? Flaws, potentials and future challenges

Title of Series

SoilHealthNow! - 2025

Number of Parts

Author

Nussbaum, Madlene

License

CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/70508 (DOI)

Publisher

OpenGeoHub Foundation

Release Date

2025

Language

English

Producer

Coppi, Viola

Production Year

2025

Production Place

Doorwerth

Content Metadata

Subject Area

Computer Science Environmental Sciences / Ecology Information Science

Genre

Conference/Talk

Abstract

Machine learning is promoted as a game-changer for soil health assessment, offering new ways to model complex relationships and generate high-resolution soil property maps. However, while ML has shown promise, its application in soil science is often met with overstated expectations and underappreciated limitations. This keynote critically examines the role of ML in space-time soil mapping for soil health, highlighting both its strengths and pitfalls. ML has certainly advanced soil mapping in an unprecedented way to achieve continental and even global maps at high resolution for numerous soil properties. Entangled soil processes and the variability of locations, all nearly having an individual set of soil-forming factors, result in complex space-time soil patterns. In the commonly used mapping approach, ML has to learn all this complexity fully data-driven from the surveyed soil samples and the environmental predictors such as remote sensing data or elevation models. For cases where no environmental predictor dataset can differentiate the observed soil property patterns, ML predictions will play save and predict the average observed value for similar locations. From a soil process knowledge perspective, the mean might often not be the best prediction. For example, a forest topsoil may be buffered by carbonates and have a pH around 8 or its pH might already have dropped to reach the aluminum buffer range of around 4. A mean pH of 5-6 likely to be predicted by ML is not often observed within unfertilized forests and, hence, is rather unlikely. Similar limitations also appear while quantifying prediction uncertainty at each location. ML-based prediction intervals often contain value ranges that, from a soil process viewpoint, we already know are very unlikely. While certainly more field surveying is due to support unbiased mapping, it does not resolve the challenge. The marginal benefit of more data points for fully-data driven ML often decreases rapidly, more so in the presence of measurement errors. Soil sampling will always only provide a tiny fraction of the total 3D soil continuum we are interested in. Data-hungry ML techniques such as deep learning are therefore unlikely to excel in space-time soilmapping of soil health indicators.