Feature Engineering and Decision Trees for Predicting High Crash-Risk Locations Using Roadway Indicators

Author:

Sarigiannis Dimitrios1ORCID,Atzemi Maria1,Oke Jimi2ORCID,Christofa Eleni2ORCID,Gerasimidis Simos2

Affiliation:

1. Egnatia Odos S.A., Public Company of Maintenance & Construction, Thessaloniki, Greece

2. Department of Civil and Environmental Engineering, University of Massachusetts, Amherst, MA, USA

Abstract

Road crashes are a prevalent public health issue across the globe. The objective of this research was to develop a methodology for accurately classifying high-risk crash locations. The hypothesis of this study was that readily obtained roadway indicators can be used along with machine learning techniques to categorize locations as high crash-risk. A database containing 5,383 locations was created during 2012 to 2015 as part of the Hellenic National Road Safety Project and used to develop three binary machine learning models to classify high crash-risk locations based on roadway indicators. The three models were random forest, gradient boosting, and extra trees. This research used features engineering to reduce the number of indicators in the model, and the synthetic minority oversampling technique to address imbalances in the dataset between the minority (high crash-risk locations identified using crash reports) and majority classes (medium to low crash-risk locations identified based on local police testimonies, site inspections, and geometry analysis). Although all three models performed similarly, the extra trees model outperformed the other two on a range of performance metrics, including the area under the precision–recall curve and the F1-score. The findings revealed that design speeds, pavement markings, signage presence, and pavement condition were the most influential factors affecting roadway safety. The contribution of this research is in the development of a transferable methodology for classifying high crash-risk locations in addition to revealing key indicators for crash-risk potential, which in turn can inform cost-effective data collection and maintenance activities.

Publisher

SAGE Publications

Reference50 articles.

1. CDC. Road Traffic Injuries and Deaths—a Global Problem. Centers for Disease Control and Prevention, 2023. https://www.cdc.gov/injury/features/global-road-safety/index.html. Accessed December 14, 2020.

2. WHO. Road Traffic Injuries [Fact Sheet]. World Health Organization, 2021. https://www.who.int/news-room/fact-sheets/detail/road-traffic-injuries

3. A data mining approach to characterize road accident locations

4. Identifying significant predictors of injury severity in traffic accidents using a series of artificial neural networks

5. A Data Mining Approach to Identify Key Factors of Traffic Injury Severity

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3