Abstract
The selection of unburned labels is a crucial step in machine learning modelling of wildfire occurrence probability. However, the effect of different sampling strategies on the performance of machine learning methods has not yet been thoroughly investigated. Additionally, whether the ratio of burned labels to unburned labels should be balanced or imbalanced remains a controversial issue. To address these gaps in the literature, we examined the effects of four broadly used sampling strategies for unburned label selection: (1) random selection in the unburned areas, (2) selection of areas with only one fire event, (3) selection of barren areas, and (4) selection of areas determined by the semi-variogram geostatistical technique. The effect of the balanced and imbalanced ratio between burned and unburned labels was also investigated. The random forest (RF) method explored the relationships between historical wildfires that occurred over the period between 2001 and 2020 in Yunnan Province, China, and climate, topography, fuel and anthropogenic variables. Multiple metrics demonstrated that the random selection of the unburned labels from the unburned areas with an imbalanced dataset outperformed the other three sampling strategies. Thus, we recommend this strategy to produce the required datasets for machine learning modelling of wildfire occurrence probability.
Funder
National Natural Science Foundation of China
Sichuan Science and Technology Program
Cited by
10 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献