Use of Multiple-Select Multiple-Choice Items in a Dental Undergraduate Curriculum: Retrospective Study Involving the Application of Different Scoring Methods-Reference-Cited by-同舟云学术

Use of Multiple-Select Multiple-Choice Items in a Dental Undergraduate Curriculum: Retrospective Study Involving the Application of Different Scoring Methods

Published:2023-03-27 Issue: Volume:9 Page:e43792
ISSN:2369-3762
Container-title:JMIR Medical Education
language:en
Short-container-title:JMIR Med Educ

Author:

Kanzow Philipp^ORCID,Schmidt Dennis^ORCID,Herrmann Manfred^ORCID,Wassmann Torsten^ORCID,Wiegand Annette^ORCID,Raupach Tobias^ORCID

Abstract

Background Scoring and awarding credit are more complex for multiple-select items than for single-choice items. Forty-one different scoring methods were retrospectively applied to 2 multiple-select multiple-choice item types (Pick-N and Multiple-True-False [MTF]) from existing examination data. Objective This study aimed to calculate and compare the mean scores for both item types by applying different scoring methods, and to investigate the effect of item quality on mean raw scores and the likelihood of resulting scores at or above the pass level (≥0.6). Methods Items and responses from examinees (ie, marking events) were retrieved from previous examinations. Different scoring methods were retrospectively applied to the existing examination data to calculate corresponding examination scores. In addition, item quality was assessed using a validated checklist. Statistical analysis was performed using the Kruskal-Wallis test, Wilcoxon rank-sum test, and multiple logistic regression analysis (P<.05). Results We analyzed 1931 marking events of 48 Pick-N items and 828 marking events of 18 MTF items. For both item types, scoring results widely differed between scoring methods (minimum: 0.02, maximum: 0.98; P<.001). Both the use of an inappropriate item type (34 items) and the presence of cues (30 items) impacted the scoring results. Inappropriately used Pick-N items resulted in lower mean raw scores (0.88 vs 0.93; P<.001), while inappropriately used MTF items resulted in higher mean raw scores (0.88 vs 0.85; P=.001). Mean raw scores were higher for MTF items with cues than for those without cues (0.91 vs 0.8; P<.001), while mean raw scores for Pick-N items with and without cues did not differ (0.89 vs 0.90; P=.09). Item quality also impacted the likelihood of resulting scores at or above the pass level (odds ratio ≤6.977). Conclusions Educators should pay attention when using multiple-select multiple-choice items and select the most appropriate item type. Different item types, different scoring methods, and presence of cues are likely to impact examinees’ scores and overall examination results.

Publisher

JMIR Publications Inc.

Subject

Education

Reference24 articles.

1. A subset selection technique for scoring items on a multiple choice test

2. A “new” item format for assessing aspects of clinical competence

3. Note on the multiple true-false test exercise.

4. Relation between examinees’ true knowledge and examination scores: systematic review and exemplary calculations on Multiple-True-False items

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Use of Multiple-Choice Items in Summative Examinations: Questionnaire Survey Among German Undergraduate Dental Training Programs;JMIR Medical Education;2024-06-27

2. Scoring Single-Response Multiple-Choice Items: Scoping Review and Comparison of Different Scoring Methods;JMIR Medical Education;2023-05-19