Affiliation:
1. The University of Alabama, USA
Abstract
Researchers frequently evaluate rater judgments in performance assessments for evidence of differential rater functioning (DRF), which occurs when rater severity is systematically related to construct-irrelevant student characteristics after controlling for student achievement levels. However, researchers have observed that methods for detecting DRF may be limited in sparse rating designs, where it is not possible for every rater to score every student. In these designs, there is limited information with which to detect DRF. Sparse designs can also exacerbate the impact of artificial DRF, which occurs when raters are inaccurately flagged for DRF due to statistical artifacts. In this study, a sequential method is adapted from previous research on differential item functioning (DIF) that allows researchers to detect DRF more accurately and distinguish between true and artificial DRF. Analyses of data from a rater-mediated writing assessment and a simulation study demonstrate that the sequential approach results in different conclusions about which raters exhibit DRF. Moreover, the simulation study results suggest that the sequential procedure results in improved accuracy in DRF detection across a variety of rating design conditions. Practical implications for language testing research are discussed.
Subject
Linguistics and Language,Social Sciences (miscellaneous),Language and Linguistics