Affiliation:
1. The University of Alabama, USA
Abstract
The use of assessments that require rater judgment (i.e., rater-mediated assessments) has become increasingly popular in high-stakes language assessments worldwide. Using a systematic literature review, the purpose of this study is to identify and explore the dominant methods for evaluating rating quality within the context of research on large-scale rater-mediated language assessments. Results from the review of 259 methodological and applied studies reveal an emphasis on inter-rater reliability as evidence of rating quality that persists across methodological and applied studies, studies primarily focused on rating quality and studies not primarily focused on rating quality, and across multiple language constructs. Additional findings suggest discrepancies in rating designs used in empirical research and practical concerns in performance assessment systems. Taken together, the findings from this study highlight the reliance upon aggregate-level information that is not specific to individual raters or specific facets of an assessment context as evidence of rating quality in rater-mediated assessments. In order to inform the interpretation and use of ratings, as well as the improvement of rater-mediated assessment systems, rating quality indices are needed that go beyond group-level indicators of inter-rater reliability, and provide diagnostic evidence of rating quality specific to individual raters, students, and other facets of the assessment system. These indicators are available based on modern measurement techniques, such as Rasch measurement theory and other item response theory approaches. Implications are discussed as they relate to validity, reliability/precision, and fairness for rater-mediated assessments.
Subject
Linguistics and Language,Social Sciences (miscellaneous),Language and Linguistics
Reference39 articles.
1. Birnbaum A. (1957). Efficient design and use of tests of a mental ability for various decision making problems. Randolph Air Force Base, TX: USAF Scholl of Aviation Medicine.
2. A Perspective on the History of Generabability Theory.
Cited by
56 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献