Reliability Measurement without Limits-Reference-Cited by-同舟云学术

Reliability Measurement without Limits

Published:2008-09 Issue:3 Volume:34 Page:319-326
ISSN:0891-2017
Container-title:Computational Linguistics
language:en
Short-container-title:Computational Linguistics

Author:

Reidsma Dennis¹²,Carletta Jean¹²

Affiliation:

1. * University of Twente, Human Media Interaction, Room ZI2067, PO Box 217, NL-7500 AE Enschede, The Netherlands, .

2. ** University of Edinburgh, Human Communication Research Centre, .

Abstract

In computational linguistics, a reliability measurement of 0.8 on some statistic such as κ is widely thought to guarantee that hand-coded data is fit for purpose, with 0.67 to 0.8 tolerable, and lower values suspect. We demonstrate that the main use of such data, machine learning, can tolerate data with low reliability as long as any disagreement among human coders looks like random noise. When the disagreement introduces patterns, however, the machine learner can pick these up just like it picks up the real patterns in the data, making the performance figures look better than they really are. For the range of reliability measures that the field currently accepts, disagreement can appreciably inflate performance figures, and even a measure of 0.8 does not guarantee that what looks like good performance really is. Although this is a commonsense result, it has implications for how we work. At the very least, computational linguists should look for any patterns in the disagreement among coders and assess what impact they will have.

Publisher

MIT Press - Journals

Subject

Artificial Intelligence,Computer Science Applications,Linguistics and Language,Language and Linguistics

Link

https://www.mitpressjournals.org/doi/pdf/10.1162/coli.2008.34.3.319

Reference7 articles.

1. Identifying Sources of Disagreement: Generalizability Theory in Manual Annotation Studies

2. A Coefficient of Agreement for Nominal Scales

3. Evaluating Discourse and Dialogue Coding Schemes

4. The Kappa Statistic: A Second Look

Cited by 43 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A computer-assisted tool for automatically measuring non-native Japanese oral proficiency;Computer Assisted Language Learning;2024-07-17

2. An active learning framework and assessment of inter-annotator agreement facilitate automated recogniser development for vocalisations of a rare species, the southern black-throated finch (Poephila cincta cincta);Ecological Informatics;2023-11

3. Ground Truth Or Dare: Factors Affecting The Creation Of Medical Datasets For Training AI;Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society;2023-08-08

4. Towards a multimodal analytical framework;Discourse Markers in Doctoral Supervision Sessions;2023-08-02

5. Exploration of the characteristics of teachers' multimodal behaviours in problem‐oriented teaching activities with different response levels;British Journal of Educational Technology;2023-05-02