Author:
Zhang Mo,Shi Wenjiao,Xu Ziwei
Abstract
Abstract. Soil texture and soil particle size fractions (PSFs) play
an increasing role in physical, chemical, and hydrological processes. Many
previous studies have used machine-learning and log-ratio transformation
methods for soil texture classification and soil PSF interpolation to
improve the prediction accuracy. However, few reports have systematically
compared their performance with respect to both classification and interpolation. Here,
five machine-learning models – K-nearest neighbour (KNN), multilayer
perceptron neural network (MLP), random forest (RF), support vector machines
(SVM), and extreme gradient boosting (XGB) – combined with the original data and three log-ratio transformation methods – additive log ratio (ALR), centred log ratio (CLR), and
isometric log ratio (ILR) – were applied to evaluate soil texture and
PSFs using both raw and log-ratio-transformed data from 640 soil samples in the Heihe River basin
(HRB) in China. The results demonstrated that the log-ratio transformations
decreased the skewness of soil PSF data. For soil texture
classification, RF and XGB showed better performance with a higher overall
accuracy and kappa coefficient. They were also recommended to evaluate the
classification capacity of imbalanced data according to the area under the
precision–recall curve (AUPRC). For soil PSF interpolation, RF
delivered the best performance among five machine-learning models with the
lowest root-mean-square error (RMSE; sand had a RMSE of 15.09 %, silt was 13.86 %, and
clay was 6.31 %), mean absolute error (MAE; sand had a MAD of 10.65 %, silt was 9.99 %, and clay was 5.00 %), Aitchison distance (AD; 0.84), and standardized
residual sum of squares (STRESS; 0.61), and the highest Spearman rank
correlation coefficient (RCC; sand was 0.69, silt was 0.67, and clay was 0.69). STRESS
was improved by using log-ratio methods, especially for CLR and ILR. Prediction
maps from both direct and indirect classification were similar in the middle and
upper reaches of the HRB. However, indirect classification maps using log-ratio-transformed data provided more detailed information in the lower reaches of
the HRB. There was a pronounced improvement of 21.3 % in the kappa
coefficient when using indirect methods for soil texture classification compared
with direct methods. RF was recommended as the best strategy among the five
machine-learning models, based on the accuracy evaluation of the soil PSF
interpolation and soil texture classification, and ILR was recommended for
component-wise machine-learning models without multivariate treatment,
considering the constrained nature of compositional data. In addition, XGB
was preferred over other models when the trade-off between the accuracy and runtime was
considered. Our findings provide a reference for future works with respect to the
spatial prediction of soil PSFs and texture using machine-learning models
with skewed distributions of soil PSF data over a large area.
Funder
National Natural Science Foundation of China
Chinese Academy of Sciences
Subject
General Earth and Planetary Sciences,General Engineering,General Environmental Science