Understanding Scoring Reliability: Experiments in Calibrating Essay Readers-Reference-Cited by-同舟云学术

Understanding Scoring Reliability: Experiments in Calibrating Essay Readers

Published:1988-03 Issue:1 Volume:13 Page:1-18
ISSN:0362-9791
Container-title:Journal of Educational Statistics
language:en
Short-container-title:Journal of Educational Statistics

Author:

Braun Henry I.¹

Affiliation:

1. Educational Testing Service

Abstract

Scoring reliability of essays and other free-response questions is of considerable concern, especially in large, national administrations. This report describes a statistically designed experiment that was carried out in an operational setting to determine the contributions of different sources of variation to the unreliability of scoring. The experiment made novel use of partially balanced incomplete block designs that facilitated the unbiased estimation of certain main effects without requiring readers to assess the same paper several times. In addition, estimates were obtained of the improvement in reliability that results from removing variability from systematic sources of variation by an appropriate adjustment of the raw scores. This statistical calibration appears to be a cost-effective approach to enhancing scoring reliability when compared to simply increasing the number of readings per paper. The results of the experiment also provide a framework for examining other, simpler calibration strategies. One such strategy is briefly considered.

Publisher

American Educational Research Association (AERA)

Subject

Linguistics and Language,Anthropology,History,Language and Linguistics,Cultural Studies

Link

http://journals.sagepub.com/doi/pdf/10.3102/10769986013001001

Cited by 53 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Bridging the Grade Gap: Reducing Assessment Bias in a Multi-Grader Class;Political Analysis;2022-12-14

2. Syntactic complexity revisited: sensitivity of China’s AES-generated scores to syntactic measures, effects of discourse-mode and topic;Reading and Writing;2020-08-29

3. Predictive Modeling of Rater Behavior: Implications for Quality Assurance in Essay Scoring;Applied Measurement in Education;2020-07-02

4. Applying Cognitive Theory to the Human Essay Rating Process;Applied Measurement in Education;2020-07-02

5. Applicants Are Not Always As U C: Detecting and Correcting Reviewer Bias in UC Berkeley Graduate School Admissions;SSRN Electronic Journal;2020