Abstract
In assessment programs where scores are reported for individual examinees, it is desirable to have responses to performance exercises graded by more than one rater. If more than one item on each test form is so graded, it is also desirable that different raters grade the responses of any one examinee. This gives rise to sampling designs in which raters are nested within items. These designs lead to simple methods for estimating variance components owing to examinees and to interactions of examinees by items and examinees by raters within items. The authors review here some useful results from generalizability analysis based on these estimates and show that they may be used to correct the item response information functions and standard errors for conditional dependence of multiple ratings. Examples based on data from two performance testing studies are presented.
Subject
Psychology (miscellaneous),Social Sciences (miscellaneous)
Cited by
24 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献