Prediction of Undergraduate Student’s Study Completion Status Using MissForest Imputation in Random Forest and XGBoost Models-Reference-Cited by-同舟云学术

Prediction of Undergraduate Student’s Study Completion Status Using MissForest Imputation in Random Forest and XGBoost Models

Published:2022-02-03 Issue:1 Volume:13 Page:53-62
ISSN:2476-907X
Container-title:ComTech: Computer, Mathematics and Engineering Applications
language:
Short-container-title:ComTech

Author:

Nirmala Intan,Wijayanto Hari,Notodiputro Khairil Anwar

Abstract

The number of higher education graduates in Indonesia is calculated based on their completion status. However, many undergraduate students have reached the maximum length of study, but their completion status is unknown. This condition becomes a problem in calculating the actual number of graduates as it is used as an indicator of higher education evaluation and other policy references. Therefore, the unknown completion status of the students who have reached the maximum length of study must be predicted. The research compared the performance of Random Forest and Extreme Gradient Boosting (XGBoost) classification models in predicting the unknown completion status. The research used a dataset containing 13.377 undergraduate students’ profiles from the Higher Education Database (PDDikti), Ministry of Education, Culture, Research, and Technology. The dataset was incomplete, and the proportion of missing data was 20,9% of the total data. Because missing data might lead to prediction bias, the research also used MissForest imputation to overcome the missing data in the classification modelling and compared it to Mean/Mode and Median/Mode imputation. The results show that MissForest outperforms the other two imputations in both classifiers but requires the longest computation time. Furthermore, the XGBoost model with MissForest is significantly superior to the Random Forest model with MissForest. Hence, the best model chosen to predict the completion status is XGBoost with MissForest imputation.

Publisher

Universitas Bina Nusantara

Subject

General Medicine

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Air Quality Prediction Based on Air Pollution Emissions in the City Environment Using XGBoost with SMOTE;2022 IEEE International Conference of Computer Science and Information Technology (ICOSNIKOM);2022-10-19

2. Prediction of Academic Performance of Engineering Students by Using Data Mining Techniques;International Journal of Information and Education Technology;2022