Learning from Imbalanced Educational Data Using Ensemble Machine Learning Algorithms-Reference-Cited by-同舟云学术

Learning from Imbalanced Educational Data Using Ensemble Machine Learning Algorithms

Published:2021-04-29 Issue:Special Issue 01 Volume:18 Page:183-195
ISSN:1735-188X
Container-title:Webology
language:
Short-container-title:WEB

Author:

Lenin Thingbaijam,Chandrasekaran N.

Abstract

Student’s academic performance is one of the most important parameters for evaluating the standard of any institute. It has become a paramount importance for any institute to identify the student at risk of underperforming or failing or even drop out from the course. Machine Learning techniques may be used to develop a model for predicting student’s performance as early as at the time of admission. The task however is challenging as the educational data required to explore for modelling are usually imbalanced. We explore ensemble machine learning techniques namely bagging algorithm like random forest (rf) and boosting algorithms like adaptive boosting (adaboost), stochastic gradient boosting (gbm), extreme gradient boosting (xgbTree) in an attempt to develop a model for predicting the student’s performance of a private university at Meghalaya using three categories of data namely demographic, prior academic record, personality. The collected data are found to be highly imbalanced and also consists of missing values. We employ k-nearest neighbor (knn) data imputation technique to tackle the missing values. The models are developed on the imputed data with 10 fold cross validation technique and are evaluated using precision, specificity, recall, kappa metrics. As the data are imbalanced, we avoid using accuracy as the metrics of evaluating the model and instead use balanced accuracy and F-score. We compare the ensemble technique with single classifier C4.5. The best result is provided by random forest and adaboost with F-score of 66.67%, balanced accuracy of 75%, and accuracy of 96.94%.

Publisher

NeuroQuantology Journal

Subject

Information Systems and Management,Library and Information Sciences,Human-Computer Interaction,Software

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Comparative Analysis of Nonlinear Models Developed using Machine Learning Algorithms;WSEAS TRANSACTIONS ON INFORMATION SCIENCE AND APPLICATIONS;2024-06-20

2. Enhancing personalized learning with explainable AI: A chaotic particle swarm optimization based decision support system;Applied Soft Computing;2024-05

3. Predicting academic achievement from the collaborative influences of executive function, physical fitness, and demographic factors among primary school students in China: ensemble learning methods;BMC Public Health;2024-01-23

4. Predicting Student’s Academic Performance Using Data Mining Methods: Review Paper;2023 International Conference On Cyber Management And Engineering (CyMaEn);2023-01-26