Abstract
The universal soil loss equation (USLE) is a widely used empirical model for estimating soil loss. Among the USLE model factors, the cover management factor (C-factor) is a critical factor that substantially impacts the estimation result. Assigning C-factor values according to a land-use/land-cover (LULC) map from field surveys is a typical traditional approach. However, this approach may have limitations caused by the difficulty and cost in conducting field surveys and updating the LULC map regularly, thus significantly affecting the feasibility of multi-temporal analysis of soil erosion. To address this issue, this study uses data mining to build a random forest (RF) model between eight geospatial factors and the C-factor for the Shihmen Reservoir watershed in northern Taiwan for multi-temporal estimation of soil loss. The eight geospatial factors were collected or derived from remotely sensed images taken in 2004, a digital elevation model, and related digital maps. Due to the memory size limitation of the R software, only 4% of the total data points (population dataset) in each C-factor class were selected as the sample dataset (input dataset) for analysis using the stratified random sampling method. Seventy percent of the input dataset was used to train the RF model, and the other 30% was used to test the model. The results show that the RF model could capture the trend of vegetation recovery and soil loss reduction after the destructive event of Typhoon Aere in 2004 for multi-temporal analysis. Although the RF model was biased by the majority class’s large sample size (C = 0.01 class), the estimated soil erosion rate was close to the measurement obtained by the erosion pins installed in the watershed (90.6 t/ha-year). After the model’s completion, we furthered our aim to address the input dataset’s imbalanced data problem to improve the model’s classification performance. An ad-hoc down-sampling of the majority class technique was used to reduce the majority class’s sampling rate to 2%, 1%, and 0.5% while keeping the other minority classes at a 4% sample rate. The results show an improvement of the Kappa coefficient from 0.574 to 0.732, the AUC from 0.780 to 0.891, and the true positive rate of all minority classes combined from 0.43 to 0.70. However, the overall accuracy decreases from 0.952 to 0.846, and the true positive rate of the majority class declines from 0.99 to 0.94. The best average C-factor was achieved when the sampling rate of the majority class was 1%. On the other hand, the best soil erosion estimate was obtained when the sampling rate was 2%.
Funder
Ministry of Science and Technology, Taiwan
Subject
Earth and Planetary Sciences (miscellaneous),Computers in Earth Sciences,Geography, Planning and Development
Reference45 articles.
1. Soil Conservation: Assessing the National Resources Inventory,1986
2. Pioneering Soil Erosion Prediction—The USLE Story;Laflen,2003
3. Agricultural Non-Point Source Pollution Model, Version 4.03. AGNPS User’s Guide;Young,1994
Cited by
12 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献