Affiliation:
1. School of Geography and Planning, Sun Yat-sen University, Guangzhou 510006, China
2. China Water Resources Pearl River Planning, Surveying & Designing Co., Ltd., Guangzhou 510611, China
Abstract
Data-driven flood susceptibility modeling is an efficient way to map the spatial distribution of flood likelihood. The quality of the flood susceptibility model relies on the learning technique and the data used for learning. The performance of learning techniques has been extensively examined. However, to date, the impact of data sampling strategies has received limited attention. Random sampling is widely favored because of its ease of use. It treats flood-related data as tabular and excludes their spatial dimensions. Flood occurrence is typically uneven over space. Therefore, non-flood sampling should not be completely random. To represent the impact of the spatial dimension, this study proposed a new sampling approach based on spatial dependence, called inverse-occurrence sampling. It selects more non-flood data in low-risk areas than in high-risk areas. The new sampling approach was compared with random and stratified sampling, using six machine learning techniques in two urban areas in Guangzhou, China, with distinct flood mechanisms, that is, Tianhe (flood density 1.5/km2, clustered distribution, average slope 9.02°, downtown district) and Panyu (flood density 0.15/km2, random distribution, average slope 4.55°, suburban district). Learning techniques include support vector machine (SVM), random forest (RF), artificial neural networks (ANNs), convolutional neural networks (CNNs), CNN-SVM, and CNN-RF. The main findings of this study were as follows: (1) Sampling approaches had a greater impact on model performance than learning techniques in terms of area under the receiver operating characteristic curve (AUC). The AUC variations caused by learning techniques ranged from 0.04 to 0.09. Meanwhile, the AUC variations caused by sampling approaches were between 0.15 and 0.22, all larger than 0.1. (2) The new sampling approach outperformed that of the other two sampling approaches for high average AUC values and small AUC variations. The outperformance is robust in regard to multiple learning techniques and different flooding mechanisms. AUCs in the inverse group had a narrower range (0.14–0.18 in Tianhe and 0.35–0.39 in Panyu) than in the random group (0.24–0.28 in Tianhe and 0.43–0.53 in Panyu) and the stratified group (0.23–0.30 in Tianhe and 0.42–0.48 in Panyu). (3) The most accurate learning technique for AUC was CNN-RF, followed by SVM, CNN-SVM, RF, CNN, and ANN. (4) ANN- and CNN-based models tended to produce polarized patterns in flood susceptibility maps, contradicting the ascending order of flood density with increasing susceptibility levels. Flood density outliers tended to appear in the models derived using RF and CNN-RF. Finally, the newly proposed sampling approach is suggested to be applied to flood susceptibility mapping to reflect the impact of spatial dependence.
Funder
Guangdong Basic and Applied Basic Research Foundation
Science and Technology Program of Guangzhou
Subject
General Earth and Planetary Sciences
Reference48 articles.
1. UNDRR (United Nations Office for Disaster Risk Reduction) (2020, May 07). Human Cost of Disasters: An Overview of the Last 20 Years 2000–2019. Available online: https://www.undrr.org/publication/human-cost-disasters-overview-last-20-years-2000-2019.
2. Understanding the role of surface runoff in potential flood inundation in the Kashmir valley, Western Himalayas;Ahmad;Phys. Chem. Earth,2023
3. Global evidence of rapid urban growth in flood zones since 1985;Rentschler;Nature,2023
4. Flood Risk Assessment in Humanitarian Logistics Process Design;Delgado;J. Appl. Res. Technol.,2014
5. A multisource trend analysis of floods in Asia-Pacific 1990–2018: Implications for climate change in sustainable development goals;Kimuli;Int. J. Disaster Risk Reduct.,2021