Predicting wetland soil properties using machine learning, geophysics, and soil measurement data-Reference-Cited by-同舟云学术

Predicting wetland soil properties using machine learning, geophysics, and soil measurement data

Published:2024-04-25 Issue:6 Volume:24 Page:2398-2415
ISSN:1439-0108
Container-title:Journal of Soils and Sediments
language:en
Short-container-title:J Soils Sediments

Author:

Driba Dejene L.,Emmanuel Efemena D.,Doro Kennedy O.^ORCID

Abstract

Abstract Purpose Machine learning models can improve the prediction of spatial variation of wetland soil properties, such as soil moisture content (SMC) and soil organic matter (SOM). Their performance, however, relies on the quantity of data used to train the model, limiting their use with insufficient data. In this study, we assessed the use of synthetic data constrained by limited field data for training an eXtreme Gradient Boosting (XGBoost) algorithm used to predict the distribution of soil properties based on geophysical measurements constrained by soil samples. Materials and methods A spatial distribution of soil apparent electrical conductivity (ECa) and laboratory measurements of SOM and SMC from twenty-two core samples were acquired at the St. Michael restored wetland near Defiance, Ohio. The correlations between ECa, SOM, and SMC were explored for predicting the spatial distribution of SOM and SMC. We used a Beta Variational AutoEncoder (β-VAE) approach to synthetically generate over 70,000 training data from the original twenty-two data from soil cores. The training data samples were taken from the latent space. The XGBoost algorithm was then trained on the β-VAE generated data and used to predict the spatial distribution of SOM and SMC at the site. We also validated the accuracy of the XGBoost predictions using an original holdout model validation technique. Results and discussions The generated synthetic data using the β-VAE include both soil attributes and ECa, which are larger and more diverse than the original training set with an absolute mean reconstructed error for SMC and SOM ranging from 0.018 to 0.022 and 0.026 to 0.041, respectively. This indicates that the β-VAE successfully generated a realistic synthetic dataset and overcame the technical barrier of using limited datasets. In addition, using generated data to expand the original training data helps the XGBoost model make more accurate predictions compared to training on the original data. The XGBoost prediction performance yielded average Lin’s concordance correlation coefficient (LCCC) values of 0.82 and 0.85 for SOM and SMC and a ratio of performance to deviation (RPD) values of 1.92 and 2.22 respectively, indicating a good performance. Conclusions This study validated the use of β-VAE to successfully generate synthetic wetland soil datasets with attributes of the original field data that can be effectively used to train the machine learning XGBoost model. The proposed framework offers an efficient solution for mapping the spatial variability of soil property in data-scarce wetland soil environments.

Funder

Ohio Lake Erie Commission

Ohio Sea Grant College, Ohio State University

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s11368-024-03801-1.pdf

Reference70 articles.

1. Akrami H, Joshi AA, Li J, Aydöre S, Leahy RM (2022) A robust variational autoencoder using beta divergence. Knowl Based Systems 238:107886. https://doi.org/10.1016/j.knosys.2021.107886

2. Allred BJ, Ehsani MR, Daniels JJ (2008) General considerations for geophysical methods applied to agriculture. In: Allred BJ, Daniels JJ, Ehsani MR (eds) Handbook of Agricultural Geophysics. CRC Press, Taylor and Francis Group, Boca Raton, Florida, pp 3–16

3. Arvanitis TN, White S, Harrison S, Chaplin R, Despotou G (2022) A method for machine learning generation of realistic synthetic datasets for validating healthcare applications. Health Info J 28(2):14604582221077000. https://doi.org/10.1177/14604582221077000

4. Becker AM, Becker RH, Doro KO (2021) Locating drainage tiles at a wetland restoration site within the Oak Openings region of Ohio, United States using UAV and land based geophysical techniques. Wetlands 41:116. https://doi.org/10.1007/s13157-021-01495-6

5. Binley A, Slater L (2020) Resistivity and induced polarization: Theory and applications to the near-surface earth. Cambridge University Press, Cambridge, United Kingdom