A Comparison of Reliability Coefficients for Ordinal Rating Scales-Reference-Cited by-同舟云学术

A Comparison of Reliability Coefficients for Ordinal Rating Scales

Published:2021-04-22 Issue:3 Volume:38 Page:519-543
ISSN:0176-4268
Container-title:Journal of Classification
language:en
Short-container-title:J Classif

Author:

de Raadt Alexandra,Warrens Matthijs J.^ORCID,Bosker Roel J.,Kiers Henk A. L.

Abstract

AbstractKappa coefficients are commonly used for quantifying reliability on a categorical scale, whereas correlation coefficients are commonly applied to assess reliability on an interval scale. Both types of coefficients can be used to assess the reliability of ordinal rating scales. In this study, we compare seven reliability coefficients for ordinal rating scales: the kappa coefficients included are Cohen’s kappa, linearly weighted kappa, and quadratically weighted kappa; the correlation coefficients included are intraclass correlation ICC(3,1), Pearson’s correlation, Spearman’s rho, and Kendall’s tau-b. The primary goal is to provide a thorough understanding of these coefficients such that the applied researcher can make a sensible choice for ordinal rating scales. A second aim is to find out whether the choice of the coefficient matters. We studied to what extent we reach the same conclusions about inter-rater reliability with different coefficients, and to what extent the coefficients measure agreement in a similar way, using analytic methods, and simulated and empirical data. Using analytical methods, it is shown that differences between quadratic kappa and the Pearson and intraclass correlations increase if agreement becomes larger. Differences between the three coefficients are generally small if differences between rater means and variances are small. Furthermore, using simulated and empirical data, it is shown that differences between all reliability coefficients tend to increase if agreement between the raters increases. Moreover, for the data in this study, the same conclusion about inter-rater reliability was reached in virtually all cases with the four correlation coefficients. In addition, using quadratically weighted kappa, we reached a similar conclusion as with any correlation coefficient a great number of times. Hence, for the data in this study, it does not really matter which of these five coefficients is used. Moreover, the four correlation coefficients and quadratically weighted kappa tend to measure agreement in a similar way: their values are very highly correlated for the data in this study.

Publisher

Springer Science and Business Media LLC

Subject

Library and Information Sciences,Statistics, Probability and Uncertainty,Psychology (miscellaneous),Mathematics (miscellaneous)

Link

https://link.springer.com/content/pdf/10.1007/s00357-021-09386-5.pdf

Reference66 articles.

1. Abraira, V., & Pérez de Vargas, A. (1999). Generalization of the kappa coefficient for ordinal categorical data, multiple observers and incomplete designs. Qüestiió, 23, 561–571.

2. Banerjee, M. (1999). Beyond kappa: a review of interrater agreement measures. Canadian Journal of Statistics-Revue Canadienne de Statistique, 27, 3–23.

3. Berry, K.J., Johnston, J.E., Zahran, S., & Mielke, P.W. (2009). Stuart’s tau measure of effect size for ordinal variables: some methodological considerations. Behavior Research Methods, 41, 1144–1148.

4. Blackman, N.J.M., & Koval, J.J. (2000). Interval estimation for Cohen’s kappa as a measure of agreement. Statistics in Medicine, 19, 723–741.

5. Brenner, H., & Kliebsch, U. (1996). Dependence of weighted kappa coefficients on the number of categories. Epidemiology, 7, 199–202.

Cited by 43 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Human Gait phases recognition based on multi-source data fusion and BILSTM attention neural network;Measurement;2024-10

2. AI-based differential diagnosis of dementia etiologies on multimodal data;Nature Medicine;2024-07-04

3. Well-being of Older Adults in Continuing Education: Age and Gender Exploration;2024-07-02

4. Developing systems thinking to address climate change;International Journal of Sustainability in Higher Education;2024-06-25

5. Large language models and automated essay scoring of English language learner writing: Insights into validity and reliability;Computers and Education: Artificial Intelligence;2024-06