Analysis of the Performance of Machine Learning Models in Predicting the Severity Level of Large-Truck Crashes-Reference-Cited by-同舟云学术

Analysis of the Performance of Machine Learning Models in Predicting the Severity Level of Large-Truck Crashes

Published:2022-11-16 Issue:4 Volume:2 Page:939-955
ISSN:2673-7590
Container-title:Future Transportation
language:en
Short-container-title:Future Transportation

Author:

Liu Jinli,Qi Yi^ORCID,Tao Jueqiang,Tao Tao

Abstract

Large-truck crashes often result in substantial economic and social costs. Accurate prediction of the severity level of a reported truck crash can help rescue teams and emergency medical services take the right actions and provide proper medical care, thereby reducing its economic and social costs. This study aims to investigate the modeling issues in using machine learning methods for predicting the severity level of large-truck crashes. To this end, six representative machine learning (ML) methods, including four classification tree-based ML models, specifically the Extreme Gradient Boosting tree (XGBoost), the Adaptive Boosting tree (AdaBoost), Random Forest (RF), and the Gradient Boost Decision Tree (GBDT), and two non-tree-based ML models, specifically Support Vector Machines (SVM) and k-Nearest Neighbors (k-NN), were selected for predicting the severity level of large-truck crashes. The accuracy levels of these six methods were compared and the effects of data-balancing techniques in model prediction performance were also tested using three different resampling techniques: Undersampling, oversampling, and mix sampling. The results indicated that better prediction performances were obtained using the dataset with a similar distribution to the original sample population instead of using the datasets with a balanced sample population. Regarding the prediction performance, the tree-based ML models outperform the non-tree-based ML models and the GBDT model performed best among all of the six models.

Funder

U.S. Department of Transportation

Texas Southern University

Publisher

MDPI AG

Subject

General Medicine

Link

https://www.mdpi.com/2673-7590/2/4/52/pdf

Reference32 articles.

1. Fiorentini, N., and Losa, M. (2020). Handling imbalanced data in road crash severity prediction by machine learning algorithms. Infrastructures, 5.

2. Interaction trees with censored survival data;Int. J. Biostat.,2008

3. Handling imbalanced datasets: A review;GESTS Int. Trans. Comput. Sci. Eng.,2006

4. Sampling bias and class imbalance in maximum-likelihood logistic regression;Math. Geosci.,2011

5. Wei, F., Cai, Z., Wang, Z., Guo, Y., Li, X., and Wu, X. (2021). Investigating Rural Single-Vehicle Crash Severity by Vehicle Types Using Full Bayesian Spatial Random Parameters Logit Model. Appl. Sci., 11.

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Predicting Pedestrian Involvement in Fatal Crashes Using a TabNet Deep Learning Model;Proceedings of the 16th ACM SIGSPATIAL International Workshop on Computational Transportation Science;2023-11-13

2. Geospatial Modeling Based-Multi-Criteria Decision-Making for Flash Flood Susceptibility Zonation in an Arid Area;Remote Sensing;2023-05-14