Affiliation:
1. Educational Testing Service, Princeton, NJ, USA
Abstract
This article presents a comparative judgment approach for holistically scored constructed response tasks. In this approach, the grader rank orders (rather than rate) the quality of a small set of responses. A prior automated evaluation of responses guides both set formation and scaling of rankings. Sets are formed to have similar prior scores and subsequent rankings by graders serve to update the prior scores of responses. Final response scores are determined by weighting the prior and ranking information. This approach allows for scaling comparative judgments on the basis of a single ranking, eliminates rater effects in scoring, and offers a conceptual framework for combining human and automated evaluation of constructed response tasks. To evaluate this approach, groups of graders evaluated responses to two tasks using either the ranking (with sets of 5 responses) or traditional rating approach. Results varied by task and the relative weighting of prior versus ranking information, but in general the ranking scores showed comparable generalizability (reliability) and validity coefficients.
Subject
Applied Mathematics,Applied Psychology,Developmental and Educational Psychology,Education
Cited by
15 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献