Affiliation:
1. School of Environmental Science and Engineering, Xiamen University of Technology, Xiamen 361024, China
2. School of Computer and Information Engineering, Xiamen University of Technology, Xiamen 361024, China
3. Department of Physical Geography and Ecosystem Science, Lund University, 22228 Lund, Sweden
4. School of Geographical Science, Fujian Normal University, Fuzhou 350007, China
Abstract
The optimal selection of characteristic bands and retrieval models for the hyperspectral retrieval of soil heavy metal concentrations poses a significant challenge. Additionally, satellite-based hyperspectral retrieval encounters several issues, including atmospheric effects, limitations in temporal and radiometric resolution, and data acquisition, among others. Given this, the retrieval performance of the soil arsenic (As) concentration in Pingtan Island, the largest island in Fujian Province and the fifth largest in China, is currently unclear. This study aimed to elucidate this issue by identifying optimal characteristic bands from the full spectrum from both statistical and physical perspectives. We tested three linear models, namely Multiple Linear Regression (MLR), Partial Least Squares Regression (PLSR) and Geographically Weighted Regression (GWR), as well as three nonlinear machine learning models, including Back Propagation Neural Network (BP), Support Vector Machine Regression (SVR) and Random Forest Regression (RFR). We then retrieved soil arsenic content using ground-based soil full spectrum data on Pingtan Island. Our results indicate that the RFR model consistently outperformed all others when using both original and optimal characteristic bands. This superior performance suggests a complex, nonlinear relationship between soil arsenic concentration and spectral variables, influenced by diverse landscape factors. The GWR model, which considers spatial non-stationarity and heterogeneity, outperformed traditional models such as BP and SVR. This finding underscores the potential of incorporating spatial characteristics to enhance traditional machine learning models in geospatial studies. When evaluating retrieval model accuracy based on optimal characteristic bands, the RFR model maintained its top performance, and linear models (MLR, PLSR and GWR) showed notable improvement. Specifically, the GWR model achieved the highest r value for the validation data, indicating that selecting optimal characteristic bands based on high Pearson’s correlation coefficients (e.g., abs(Pearson’s correlation coefficient) ≥0.45) and high sensitivity to soil active materials successfully mitigates uncertainties linked to characteristic band selection solely based on Pearson’s correlation coefficients. Consequently, two effective retrieval models were generated: the best-performing RFR model and the improved GWR model. Our study on Pingtan Island provides theoretical and technical support for monitoring and evaluating soil arsenic concentrations using satellite-based spectroscopy in densely populated, relatively independent island towns in China and worldwide.
Funder
Natural Science Foundation of Fujian Province, China
Xiamen University of Technology
Subject
General Earth and Planetary Sciences