The Identification of Guessing Patterns in Progress Testing as a Machine Learning Classification Problem

Author:

Atanet Iván Roselló1ORCID,Sehy Victoria1,Sieg Miriam2ORCID,März Maren1ORCID

Affiliation:

1. Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin, AG Progress Test Medizin, Charitéplatz 1, 10117, Berlin, Germany

2. Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin, AG Progress Test Medizin, Charitéplatz 1, 10117, Berlin, Germany and Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin, Institute of Biometry and Clinical Epidemiology, Charitéplatz 1, 10117, Berlin, Germany

Abstract

Abstract

Background The detection of guessing patterns in low-stakes progress testing could naturally be understood as a statistical classification problem where test takers are assigned to groups according to probabilities given by a machine learning model. However, the relevant literature on this topic does not include many examples where this approach is discussed; to date, the strategies applied to tackle this problem have been mostly based either on rapid response counting or the detection of unusual answer patterns. Methods On the basis of 14,897 participations in the Progress Test Medizin test – which takes place twice a year since 1999 in selected medical schools of Germany, Austria and Switzerland - we formulate the identification of guessing patterns as a binary classification problem. Next, we compare the performance of a logistic regression algorithm in this setup to that of the nonparametric person-fit indices included in R´s PerFit package. Finally, we determine probability thresholds based on the values of the logistic regression functions obtained from the algorithm. Results Comparison of logistic regression algorithm with person-fit indices The logistic regression algorithm included in Python´s Scikit-Learn reached ROC-AUC scores of 0.886 to 0.903 depending on the dataset, while the 11 person-fit indices analysed returned ROC-AUC scores of 0.548 to 0.761. Best feature set Datasets based on aggregate scores yielded better results than those were the sets of answers to every item were considered as individual features. The best results were reached with a feature set containing only two parameters (self-monitoring accuracy and number of answered questions); considering the amount of time spent on the test did not lead to any performance improvement. Probability thresholds Based on the values of the logistic regression function generated by the applied algorithm, it is possible to establish thresholds above which there is at least a 90% chance of having guessed most answers. Conclusions In our setting, logistic regression clearly outperformed nonparametric person-fit indices in the task of identifying guessing patterns. We attribute this result to the greater flexibility of machine learning methods, which makes them more adaptable to diverse test environments than person-fit indices.

Funder

Bundesministerium für Bildung und Forschung

Publisher

Springer Science and Business Media LLC

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3