Exploration of the Stacking Ensemble Machine Learning Algorithm for Cheating Detection in Large-Scale Assessment-Reference-Cited by-同舟云学术

Exploration of the Stacking Ensemble Machine Learning Algorithm for Cheating Detection in Large-Scale Assessment

Published:2022-08-13 Issue: Volume: Page:001316442211171
ISSN:0013-1644
Container-title:Educational and Psychological Measurement
language:en
Short-container-title:Educational and Psychological Measurement

Author:

Zhou Todd¹,Jiao Hong²^ORCID

Affiliation:

1. Winston Churchill High School, Potomac, MD, USA

2. University of Maryland, College Park, USA

Abstract

Cheating detection in large-scale assessment received considerable attention in the extant literature. However, none of the previous studies in this line of research investigated the stacking ensemble machine learning algorithm for cheating detection. Furthermore, no study addressed the issue of class imbalance using resampling. This study explored the application of the stacking ensemble machine learning algorithm to analyze the item response, response time, and augmented data of test-takers to detect cheating behaviors. The performance of the stacking method was compared with that of two other ensemble methods (bagging and boosting) as well as six base non-ensemble machine learning algorithms. Issues related to class imbalance and input features were addressed. The study results indicated that stacking, resampling, and feature sets including augmented summary data generally performed better than its counterparts in cheating detection. Compared with other competing machine learning algorithms investigated in this study, the meta-model from stacking using discriminant analysis based on the top two base models—Gradient Boosting and Random Forest—generally performed the best when item responses and the augmented summary statistics were used as the input features with an under-sampling ratio of 10:1 among all the study conditions.

Publisher

SAGE Publications

Subject

Applied Mathematics,Applied Psychology,Developmental and Educational Psychology,Education

Link

http://journals.sagepub.com/doi/pdf/10.1177/00131644221117193

Reference28 articles.

1. Anguita D., Ghelardoni L., Ghio A., Oneto L., Ridella S. (2012). The “K” in K-fold cross validation. In ESANN 2012 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (pp. 25–27). https://www.esann.org/sites/default/files/proceedings/legacy/es2012-62.pdf

2. Bishop S., Egan K. (2017). Detecting erasures and unusual gain scores. In Cizek G. J., Wollack J. A. (Eds.), Handbook of quantitative methods for detecting cheating on tests (pp. 193–213). Routledge. https://doi.org/10.4324/9781315743097-10

3. SMOTE: Synthetic Minority Over-sampling Technique

4. Chen Y., Lu Y., Moustaki I. (2020). Detection of two-way outliers in multivariate data and application to cheating detection in educational tests. arXiv:1911.09408. https://arxiv.org/abs/1911.09408v2

Cited by 18 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. COVID-19 seroprevalence estimation and forecasting in the USA from ensemble machine learning models using a stacking strategy;Expert Systems with Applications;2024-12

2. A stacking ensemble model for predicting soil organic carbon content based on visible and near-infrared spectroscopy;Infrared Physics & Technology;2024-08

3. Answer Watermarking: Using Answer Generation Assistance Tools to Find Evidence of Cheating;Proceedings of the Eleventh ACM Conference on Learning @ Scale;2024-07-09

4. Examinator v4.0 : Cheating Detection in Online Take-Home Exams;Proceedings of the Eleventh ACM Conference on Learning @ Scale;2024-07-09

5. A Stacking Ensemble Machine Learning Strategy for COVID-19 Seroprevalence Estimations in the USA Based on Genetic Programming;2024 IEEE Congress on Evolutionary Computation (CEC);2024-06-30