Are faculty predictions or item taxonomies useful for estimating the outcome of multiple-choice examinations?-Reference-Cited by-同舟云学术

Are faculty predictions or item taxonomies useful for estimating the outcome of multiple-choice examinations?

Published:2011-12 Issue:4 Volume:35 Page:396-401
ISSN:1043-4046
Container-title:Advances in Physiology Education
language:en
Short-container-title:Advances in Physiology Education

Author:

Kibble Jonathan D.¹²,Johnson Teresa²

Affiliation:

1. Health Sciences Centre, Memorial University of Newfoundland, St John's, Newfoundland, Canada; and

2. College of Medicine, University of Central Florida, Orlando, Florida

Abstract

The purpose of this study was to evaluate whether multiple-choice item difficulty could be predicted either by a subjective judgment by the question author or by applying a learning taxonomy to the items. Eight physiology faculty members teaching an upper-level undergraduate human physiology course consented to participate in the study. The faculty members annotated questions before exams with the descriptors “easy,” “moderate,” or “hard” and classified them according to whether they tested knowledge, comprehension, or application. Overall analysis showed a statistically significant, but relatively low, correlation between the intended item difficulty and actual student scores (ρ = −0.19, P < 0.01), indicating that, as intended item difficulty increased, the resulting student scores on items tended to decrease. Although this expected inverse relationship was detected, faculty members were correct only 48% of the time when estimating difficulty. There was also significant individual variation among faculty members in the ability to predict item difficulty (χ2 = 16.84, P = 0.02). With regard to the cognitive level of items, no significant correlation was found between the item cognitive level and either actual student scores (ρ = −0.09, P = 0.14) or item discrimination (ρ = 0.05, P = 0.42). Despite the inability of faculty members to accurately predict item difficulty, the examinations were of high quality, as evidenced by reliability coefficients (Cronbach's α) of 0.70–0.92, the rejection of only 4 of 300 items in the postexamination review, and a mean item discrimination (point biserial) of 0.37. In conclusion, the effort of assigning annotations describing intended difficulty and cognitive levels to multiple-choice items is of doubtful value in terms of controlling examination difficulty. However, we also report that the process of annotating questions may enhance examination validity and can reveal aspects of the hidden curriculum.

Publisher

American Physiological Society

Subject

General Medicine,Physiology,Education

Link

https://www.physiology.org/doi/pdf/10.1152/advan.00062.2011

Reference13 articles.

1. Subject Matter Experts' Assessment of Item Statistics

2. Measurement practices: methods for developing content-valid student examinations

3. Twelve tips for blueprinting

4. Applying learning taxonomies to test items

Cited by 19 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. The difference between estimated and perceived item difficulty: An empirical study;International Journal of Assessment Tools in Education;2024-06-20

2. Evaluating the Efficacy of Generative Artificial Intelligence in Grading: Insights from Authentic Assessments in Economics;SSRN Electronic Journal;2024

3. Comparing Estimated and Real Item Difficulty Using Multi-Facet Rasch Analysis;Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi;2023-12-31

4. Race with the machines: Assessing the capability of generative AI in solving authentic assessments;Australasian Journal of Educational Technology;2023-12-22

5. Designing formative assessments to improve anatomy exam performance;Anatomical Sciences Education;2023-08-11