Affiliation:
1. Geomatics Research Institute, Pukyong National University, Busan 48513, Republic of Korea
2. Satellite Planning Division, National Meteorological Satellite Center, Jincheon 27803, Republic of Korea
3. Climate Service and Research Division, APEC Climate Center, Busan 48058, Republic of Korea
4. Department of Spatial Information Engineering, Pukyong National University, Busan 48513, Republic of Korea
Abstract
Soil moisture (SM) is an indicator of the moisture status of the land surface, which is useful for monitoring extreme weather events. Representative global SM datasets include the National Aeronautics and Space Administration (NASA) Soil Moisture Active Passive (SMAP), the Global Land Data Assimilation System (GLDAS), and the European Centre for Medium-Range Weather Forecasts (ECMWF) Reanalysis 5 (ERA5), but due to their low spatial resolutions, none of these datasets well describe SM changes in local areas, and they tend to have a low accuracy. Machine learning (ML)-based SM predictions have demonstrated high accuracy, but obtaining semi-real-time SM information remains challenging, and the dependence of the validation accuracy on the data sampling method used, such as random or yearly sampling, has led to uncertainties. In this study, we aimed to develop an ML-based model for real-time SM estimation that can capture local-scale variabilities in SM and have reliable accuracy, regardless of the sampling method. This study was conducted in South Korea, and satellite image data, numerical weather prediction (NWP) data, and topographic data provided within one day were used as the input data. For SM modeling, 13 input variables affecting the surface SM status were selected: 10- and 20-day cumulative standardized precipitation indexes (SPI10 and SPI20), a normalized difference vegetation index (NDVI), downward shortwave radiation (DSR), air temperature (Tair), land surface temperature (LST), soil temperature (Tsoil), relative humidity (RH), latent heat flux (LE), slope, elevation, topographic ruggedness index (TRI), and aspect. Then, SM models based on random forest (RF) and automated machine learning (AutoML) were constructed, trained, and validated using random sampling and leave-one-year-out (LOYO) cross-validation. The RF- and AutoML-based SM models had significantly high accuracy rates based on comparisons with in situ SM (mean absolute error (MAE) = 2.212–4.132%; mean bias error (MBE) = −0.110–0.136%; root mean square error (RMSE) = 3.186–5.384%; correlation coefficient (CC) = 0.732–0.913), while the AutoML-based SM model tended to have a higher accuracy than the RF-based SM model, regardless of the data sampling method used. In addition, when compared to in situ SM data, the SM models demonstrated the highest accuracy, outperforming both GLDAS and ERA5 SM data and well representing changes in the dryness/wetness of the land surface according to meteorological events (heatwave, drought, and rainfall). The SM models proposed in this study can, thus, offer semi-real-time SM data, aiding in the monitoring of moisture changes in the land surface, as well as short-term meteorological disasters, like flash droughts or floods.
Funder
National Research Foundation of Korea
R&D Program of Korea Meteorological Administration
Cooperative Research Program for Agriculture Science & Technology Development
Subject
General Earth and Planetary Sciences