Abstract
Abstract
This study explores the potential of machine learning to predict the risk of accidents in construction projects. Data has been gathered from a Norwegian construction company over a period of nearly seven years, consisting of 156 projects. 46 features are constructed, primarily focusing on observations and incidents on health, safety, and environment, as well as quality deviations. Using mutual information, 20 important features are identified. These are later used to train six classification models, which are evaluated using 10-fold cross-validation. The target feature of the classification problem is the level of risk, which describes the probability of accidents for a project: low risk, risk of less severe accidents, risk of serious accidents, and risk of critical accidents. The model performances are poor compared to previous studies. This is likely a result of the amount of projects and the total number of different features used to train the models. Based on the limited data that is utilized, the results still indicate that there is a potential in some of the data, especially observations and incidents. It is suggested that incorporating project worker-related data and more project information could enhance the accuracy of predictions.