Affiliation:
1. Department of Civil Environmental and Architectural Engineering, University of Kansas, KS
2. Department of Civil Engineering, National Technical University of Athens, Athens, Greece
Abstract
Road crashes are a common occurrence in many parts of the world, causing significant loss of life, injury, and economic damage. Crashes can be broadly classified into single-vehicle (SV) crashes and multi-vehicle (MV) crashes. Various statistical approaches have been implemented to identify the key factors behind these two types of crashes and it has been concluded that these factors need to be analyzed separately. The dataset for this research included various types of roadway design parameters and traffic conditions. Combinations of three feature-selection techniques, namely ANOVA, correlation matrix, and ExtraTreesClassifier algorithm, were utilized to separately select the appropriate variables for SV and MV crash analysis. Various machine learning (ML) models (e.g., LightGBM, XGBoost, etc.) along with a statistical method (binary logistic regression) have been adopted to predict SV and MV crash occurrences. The results show that gradient boosting-type ML algorithms outperform the remaining prediction models, and the LightGBM was found to be the most powerful in prediction. The LightGBM classifier produced accuracy, ROC_AUC, and avg. F-1 score of 0.75, 0.83, and 0.76, respectively, for MV crashes and 0.76, 0.82, and 0.76, respectively, for SV crashes. The SHapley Additive exPlanations (SHAP) analysis was used to explain how each variable affected the models’ output. The results confirmed that the crash factors associated with SV and MV crashes are different and that some variables have inverse impact. Artificial intelligence and ML can assist transportation professionals in better understanding the causes of SV and MV crashes and advance the process toward Vision Zero.