Rater types in writing performance assessments: A classification approach to rater variability-Reference-Cited by-同舟云学术

Rater types in writing performance assessments: A classification approach to rater variability

Published:2008-04 Issue:2 Volume:25 Page:155-185
ISSN:0265-5322
Container-title:Language Testing
language:en
Short-container-title:Language Testing

Author:

Eckes Thomas¹

Affiliation:

1. TestDaF Institute, Germany,

Abstract

Research on rater effects in language performance assessments has provided ample evidence for a considerable degree of variability among raters. Building on this research, I advance the hypothesis that experienced raters fall into types or classes that are clearly distinguishable from one another with respect to the importance they attach to scoring criteria. To examine the rater type hypothesis, I asked 64 raters actively involved in scoring examinee writing performance on a large-scale assessment instrument to indicate on a four-point scale how much importance they would attach to each of nine routinely used criteria. The criteria covered various performance aspects, such as fluency, completeness, and grammatical correctness. In a preliminary step, many-facet Rasch analysis revealed that raters differed significantly in their views on the importance of the various criteria. A two-mode clustering technique yielded a joint classification of raters and criteria, with six rater types emerging from the analysis. Each of these types was characterized by a distinct scoring profile, indicating that raters were far from dividing their attention evenly among the set of criteria. Moreover, rater background variables were shown to partially account for the scoring profile differences. The findings have implications for assessing the quality of large-scale rater-mediated language testing, rater monitoring, and rater training.

Publisher

SAGE Publications

Subject

Linguistics and Language,Social Sciences (miscellaneous),Language and Linguistics

Link

http://journals.sagepub.com/doi/pdf/10.1177/0265532207086780

Reference70 articles.

1. Investigating variability in tasks and rater judgements in a performance test of foreign language speaking

2. Do English and ESL Faculties Rate Writing Samples Differently?

3. Recurrence Properties in Two-Mode Hierarchical Clustering

Cited by 153 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Rating writing: Comparison of holistic and analytic grading approaches in pre-service teachers;Learning and Instruction;2024-12

2. Can AI provide useful holistic essay scoring?;Computers and Education: Artificial Intelligence;2024-12

3. Assessment-Relevant Stimuli and Judging of Writing Performances – From Micro-Judgments to Macro-Judgments;ELOPE: English Language Overseas Perspectives and Enquiries;2024-08-22

4. Large language models and automated essay scoring of English language learner writing: Insights into validity and reliability;Computers and Education: Artificial Intelligence;2024-06

5. All types of experience are equal, but some are more equal: The effect of different types of experience on rater severity and rater consistency;Language Testing;2024-04-09