Abstract
A variety of measures of reliability for two-category nominal scales are reviewed and compared. It is shown that upon correcting these indices for chance agreement, there are only five distinct indices: Fleiss's modification of A1, the φ coefficient, Cohen's kappa, and two intraclass coefficients. Additional derivations indicate that when marginals are held constant, all but one of the measures are linear functions of agreement and, thus, of one another. In particular, they are equal once the maximum obtainable values for a given data set are equated. The single exception is an intraclass correlation that explicitly includes variation due to observer mean differences as part of the error variance. This index is dependent on sample size; moreover, as the number of subjects increases, this index approaches the kappa coefficient as a limit. Recommendations for choosing an index of agreement are made based on definitions, magnitude, convenience, and consistency.
Subject
Applied Mathematics,Applied Psychology,Developmental and Educational Psychology,Education
Cited by
11 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献