Author:
Zhang Jianbin,Duan Zexia,Zhou Shaohui,Li Yubin,Gao Zhiqiu
Abstract
Abstract. This study investigated the accuracy of the random forest (RF)
model in gap filling the sensible (H) and latent heat (LE) fluxes, by using
the observation data collected at a site over rice–wheat rotation croplands
in Shouxian County of eastern China from 15 July 2015 to 24 April 2019.
Firstly, the variable significance of the machine learning (ML) model's
five input variables, including the net radiation (Rn), wind speed (WS),
temperature (T), relative humidity (RH), and air pressure (P), was
examined, and it was found that Rn accounted for 78 % and 76 % of the
total variable significance in H and LE calculating, respectively, showing
that it was the most important input variable. Secondly, the RF model's
accuracy with the five-variable (Rn, WS, T, RH, P) input combination was
evaluated, and the results showed that the RF model could reliably gap fill
the H and LE with mean absolute errors (MAEs) of 5.88 and 20.97 W m−2, and root mean square errors (RMSEs) of 10.67 and 29.46 W m−2, respectively. Thirdly, four-variable input combinations were tested,
and it was found that the best input combination was (Rn, WS, T, P) by
removing RH from the input list, and its MAE values of H and LE were reduced
by 12.65 % and 7.12 %, respectively. At last, through the Taylor
diagram, H and LE gap-filling accuracies of the RF model, the support vector
machine (SVM) model, the k nearest-neighbor (KNN) model, and the gradient
boosting decision tree (GBDT) model were intercompared, and the statistical
metrics showed that RF was the most accurate for both H and LE gap filling,
while the LR and KNN model performed the worst for H and LE gap filling,
respectively.