Author:
Nirmala Intan,Wijayanto Hari,Notodiputro Khairil Anwar
Abstract
The number of higher education graduates in Indonesia is calculated based on their completion status. However, many undergraduate students have reached the maximum length of study, but their completion status is unknown. This condition becomes a problem in calculating the actual number of graduates as it is used as an indicator of higher education evaluation and other policy references. Therefore, the unknown completion status of the students who have reached the maximum length of study must be predicted. The research compared the performance of Random Forest and Extreme Gradient Boosting (XGBoost) classification models in predicting the unknown completion status. The research used a dataset containing 13.377 undergraduate students’ profiles from the Higher Education Database (PDDikti), Ministry of Education, Culture, Research, and Technology. The dataset was incomplete, and the proportion of missing data was 20,9% of the total data. Because missing data might lead to prediction bias, the research also used MissForest imputation to overcome the missing data in the classification modelling and compared it to Mean/Mode and Median/Mode imputation. The results show that MissForest outperforms the other two imputations in both classifiers but requires the longest computation time. Furthermore, the XGBoost model with MissForest is significantly superior to the Random Forest model with MissForest. Hence, the best model chosen to predict the completion status is XGBoost with MissForest imputation.
Publisher
Universitas Bina Nusantara
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献