Abstract
Large-truck crashes often result in substantial economic and social costs. Accurate prediction of the severity level of a reported truck crash can help rescue teams and emergency medical services take the right actions and provide proper medical care, thereby reducing its economic and social costs. This study aims to investigate the modeling issues in using machine learning methods for predicting the severity level of large-truck crashes. To this end, six representative machine learning (ML) methods, including four classification tree-based ML models, specifically the Extreme Gradient Boosting tree (XGBoost), the Adaptive Boosting tree (AdaBoost), Random Forest (RF), and the Gradient Boost Decision Tree (GBDT), and two non-tree-based ML models, specifically Support Vector Machines (SVM) and k-Nearest Neighbors (k-NN), were selected for predicting the severity level of large-truck crashes. The accuracy levels of these six methods were compared and the effects of data-balancing techniques in model prediction performance were also tested using three different resampling techniques: Undersampling, oversampling, and mix sampling. The results indicated that better prediction performances were obtained using the dataset with a similar distribution to the original sample population instead of using the datasets with a balanced sample population. Regarding the prediction performance, the tree-based ML models outperform the non-tree-based ML models and the GBDT model performed best among all of the six models.
Funder
U.S. Department of Transportation
Texas Southern University
Reference32 articles.
1. Fiorentini, N., and Losa, M. (2020). Handling imbalanced data in road crash severity prediction by machine learning algorithms. Infrastructures, 5.
2. Interaction trees with censored survival data;Int. J. Biostat.,2008
3. Handling imbalanced datasets: A review;GESTS Int. Trans. Comput. Sci. Eng.,2006
4. Sampling bias and class imbalance in maximum-likelihood logistic regression;Math. Geosci.,2011
5. Wei, F., Cai, Z., Wang, Z., Guo, Y., Li, X., and Wu, X. (2021). Investigating Rural Single-Vehicle Crash Severity by Vehicle Types Using Full Bayesian Spatial Random Parameters Logit Model. Appl. Sci., 11.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献