Author:
Mostofi Fatemeh,Tokdemir Onur Behzat,Toğan Vedat
Abstract
AbstractThe imbalanced construction dataset reduces the accuracy of the machine learning model. This issue that addressed by recent construction management research through different sampling approaches. Despite their advantages, the utilized sampling approaches are reducing the reliability of the prediction model, while posing the risk of artificial bias. The objective of this study is to address the challenge of imbalanced datasets in construction progress prediction models using a novel variational autoencoder (VAE) that generates synthetic data for underrepresented classes. The VAE's encoder-decoder architecture, along with its latent space components, is optimized for this task. A comparative analysis using decision tree-based ML models, including grid search optimization, substantiated the effectiveness of the VAE approach. The results indicate that the hybrid dataset benefited the ML models from the addition of the synthesized dataset, showing 2% improvements in performance metrics across most models. The synthetic data generated by VAEs contributes to the construction of more balanced datasets, which, in turn, can lead to more reliable and accurate predictive models. The enhanced accuracy of the VAE-ML model addresses the class imbalance problem and improves the reliability of construction productivity predictions and related resource allocation plans.
Publisher
Springer Nature Singapore