Interpretable spectroscopic modelling of soil with machine learning-Reference-Cited by-同舟云学术

Interpretable spectroscopic modelling of soil with machine learning

Published:2023-05 Issue:3 Volume:74 Page:
ISSN:1351-0754
Container-title:European Journal of Soil Science
language:en
Short-container-title:European J Soil Science

Author:

Wadoux Alexandre M. J.‐C.¹

Affiliation:

1. Sydney Institute of Agriculture & School of Life and Environmental Sciences The University of Sydney Sydney NSW Australia

Abstract

AbstractSpectroscopic modelling of soil has advanced greatly with the development of large spectral libraries, computational resources and statistical modelling. The use of complex statistical and algorithmic tools from the field of machine learning has become popular for predicting properties from their visible, near‐ and mid‐infrared spectra. Many users, however, find it difficult to trust the predictions made with machine learning. We lack interpretation and understanding of how the predictions were made, so that these models are often referred to as black boxes. In this study, I report on the development and application of a model‐independent method for interpreting complex machine learning spectroscopic models. The method relies on Shapley values, a statistical approach originally developed in coalitional game theory. In a case study for predicting the total organic carbon from a large European mid‐infrared spectroscopic database, I fitted a random forest machine learning model and showed how Shapley values can help us understand (i) the average contribution of individual wavenumbers, (ii) the contribution of the spectrum‐specific wavenumbers, and (iii) the average contribution of groups of spectra taken together with similar characteristics. The results show that Shapley values revealed more insights than commonly used interpretation methods based on the variable importance. The most striking spectral regions identified as important contributors to the prediction corresponded to the molecular vibration of organic and inorganic compounds that are known to relate to organic carbon. Shapley values are a useful methodological development that will yield a better understanding and trust of complex machine learning and algorithmic tool in soil spectroscopy research.

Publisher

Wiley

Subject

Soil Science

Reference53 articles.

1. Visualizing the effects of predictor variables in black box supervised learning models

2. Standard Normal Variate Transformation and De-Trending of Near-Infrared Diffuse Reflectance Spectra

3. Soil spectroscopy with the Gaussian pyramid scale space

4. Bagging predictors

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Rapid assessment of vanilla (Vanilla planifolia) quality parameters using portable near-infrared spectroscopy combined with random forest;Journal of Food Composition and Analysis;2024-09

2. Multivariate regional deep learning prediction of soil properties from near-infrared, mid-infrared and their combined spectra;Geoderma Regional;2024-06

3. Improving the generalization error and transparency of regression models to estimate soil organic carbon using soil reflectance data;Ecological Informatics;2023-11

4. Incorporating soil knowledge into machine‐learning prediction of soil properties from soil spectra;European Journal of Soil Science;2023-11