Machine Learning Assessment of Damage Grade for Post-Earthquake Buildings: A Three-Stage Approach Directly Handling Categorical Features
-
Published:2023-09-18
Issue:18
Volume:15
Page:13847
-
ISSN:2071-1050
-
Container-title:Sustainability
-
language:en
-
Short-container-title:Sustainability
Author:
Li Yutao1ORCID, Jia Chuanguo12ORCID, Chen Hong34, Su Hongchen1, Chen Jiahao1, Wang Duoduo1
Affiliation:
1. School of Civil Engineering, Chongqing University, Chongqing 400045, China 2. Key Laboratory of New Technology for Construction of Cities in Mountain Area (Chongqing University), Ministry of Education, Chongqing 400045, China 3. School of Computer Science and Engineering, Beihang University, Beijing 100191, China 4. State Key Lab of Software Development Environment, Beihang University, Beijing 100191, China
Abstract
The rapid assessment of post-earthquake building damage for rescue and reconstruction is a crucial strategy to reduce the enormous number of human casualties and economic losses caused by earthquakes. Conventional machine learning (ML) approaches for this problem usually employ one-hot encoding to cope with categorical features, and their overall procedure is neither sufficient nor comprehensive. Therefore, this study proposed a three-stage approach, which can directly handle categorical features and enhance the entire methodology of ML applications. In stage I, an integrated data preprocessing framework involving subjective–objective feature selection was proposed and performed on a dataset of buildings after the 2015 Gorkha earthquake. In stage II, four machine learning models, KNN, XGBoost, CatBoost, and LightGBM, were trained and tested on the dataset. The best model was judged by comprehensive metrics, including the proposed risk coefficient. In stage III, the feature importance, the relationships between the features and the model’s output, and the feature interaction effects were investigated by Shapley additive explanations. The results indicate that the LightGBM model has the best overall performance with the highest accuracy of 0.897, the lowest risk coefficient of 0.042, and the shortest training time of 12.68 s due to its relevant algorithms for directly tackling categorical features. As for its interpretability, the most important features are determined, and information on these features’ impacts and interactions is obtained to improve the reliability of and promote practical engineering applications for the ML models. The proposed three-stage approach can provide a reference for the overall ML implementation process on raw datasets for similar problems.
Funder
National Natural Science Foundation of China Graduate Scientific Research and Innovation Foundation of Chongqing, China
Subject
Management, Monitoring, Policy and Law,Renewable Energy, Sustainability and the Environment,Geography, Planning and Development,Building and Construction
Reference59 articles.
1. Mapping Urban Resilience to Disasters—A Review;Cariolet;Sustain. Cities Soc.,2019 2. Han, L., Ma, Q., Zhang, F., Zhang, Y., Zhang, J., Bao, Y., and Zhao, J. (2019). Risk Assessment of An Earthquake-Collapse-Landslide Disaster Chain by Bayesian Network and Newmark Models. Int. J. Environ. Res. Public. Health, 16. 3. Observing Community Resilience from Space: Using Nighttime Lights to Model Economic Disturbance and Recovery Pattern in Natural Disaster;Qiang;Sustain. Cities Soc.,2020 4. Overview of the 2010 Haiti Earthquake;DesRoches;Earthq. Spectra,2011 5. Gautam, D., and Rodrigues, H. (2018). Impacts and Insights of the Gorkha Earthquake, Elsevier.
|
|