Automated Item Generation: – Impact of item variants on performance and standard setting

Author:

Westacott Rachel1,Badger Kerry2,Kluth David3,Gurnell Mark4,Reed Malcolm W. R.5,Sam Amir H.2

Affiliation:

1. University of Birmingham

2. Imperial College School of Medicine, Imperial College London

3. The University of Edinburgh

4. University of Cambridge and NIHR Cambridge, Cambridge University Hospitals

5. University of Sussex

Abstract

Abstract Background Automated Item Generation (AIG) uses computer software to create multiple items from a single question model. Items generated using AIG software have been shown to be of similar quality to those produced using traditional item writing methods. However, there is currently a lack of data looking at whether item variants to a single question result in differences in student performance or human-derived standard setting. The purpose of this study was to use 50 Multiple Choice Questions (MCQs) as models to create four distinct tests which would be standard set and given to final year UK medical students, and then to compare the performance and standard setting data for each. Methods Pre-existing questions from the UK Medical Schools Council (MSC) Assessment Alliance item bank, created using traditional item writing techniques, were used to generate four ‘isomorphic’ 50-item MCQ tests using AIG software. All UK medical schools were invited to deliver one of the four papers as an online formative assessment for their final year students. Each test was standard set using a modified Angoff method. Thematic analysis was conducted for item variants with high and low levels of variance in facility (for student performance) and average scores (for standard setting). Results 2218 students from 12 UK medical schools sat one of the four papers. The average facility of the four papers ranged from 0.55–0.61, and the cut score ranged from 0.58–0.61. Twenty item models had a facility difference >0.15 and 10 item models had a difference in standard setting of >0.1. Variation in parameters that could alter clinical reasoning strategies had the greatest impact on item facility. Conclusions Item facility varied to a greater extent than the standard set. This may relate to variants creating greater disruption of clinical reasoning strategies in novice learners as opposed to experts, in addition to the well documented tendency of standard setters to revert to the mean.

Publisher

Research Square Platform LLC

Reference40 articles.

1. Angoff W. Scales, norms, and equivalent scores. In: Thorndike R, editor. Educational measurement, American Council on Education. Washington DC: American Council on Education; 1971. pp. 508–600.

2. Bejar II. Generative testing: from conception to implementation in Item Generation for Test Development, Irvine SH, Kyllonen PC, editors (Mahwah: Lawrence Erlbaum Associates). 2002: pp. p199–217.

3. Introducing progress testing in McMaster University's problem-based medical curriculum: psychometric properties and effect on learning;Blake JM;Acad Med,1996

4. Case SM, Swanson DB. Constructing Written Test Questions for the Basic and Clinical Sciences. 3rd ed. Philadelphia: National Board of Medical Examiners; 2001.

5. The Keyword Effect: A Grounded Theory Study Exploring the Role of Keywords in Clinical Communication;Chan MW;AEM Educ Train,2020

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3