Abstract
The objective of this research is to develop an machine learning (ML) -based system that evaluates the performance of high school students during the semester and identify the most significant factors affecting student performance. It also specifies how the performance of models is affected when models run on data that only include the most important features. Classifiers employed for the system include random forest (RF), support vector machines (SVM), logistic regression (LR) and artificial neural network (ANN) techniques. Moreover, the Boruta algorithm was used to calculate the importance of features. The dataset includes behavioral information, individual information and the scores of students that were collected from teachers and a one-by-one survey through an online questionnaire. As a result, the effective features of the database were identified, and the least important features were eliminated from the dataset. The ANN accuracy, which was the best accuracy in the original dataset, was reduced in the decreased dataset. On the contrary, SVM performance was improved, which had the highest accuracy among other models, with 0.78. Moreover, the LR and RF models could provide the same performance in the decreased dataset. The results showed that ML models are influential for evaluating students, and stakeholders can use the identified effective factors to improve education.
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Cited by
18 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献