Author:
Uma Alexandra,Almanea Dina,Poesio Massimo
Abstract
Crowdsourced data are often rife with disagreement, either because of genuine item ambiguity, overlapping labels, subjectivity, or annotator error. Hence, a variety of methods have been developed for learning from data containing disagreement. One of the observations emerging from this work is that different methods appear to work best depending on characteristics of the dataset such as the level of noise. In this paper, we investigate the use of an approach developed to estimate noise, temperature scaling, in learning from data containing disagreements. We find that temperature scaling works with data in which the disagreements are the result of label overlap, but not with data in which the disagreements are due to annotator bias, as in, e.g., subjective tasks such as labeling an item as offensive or not. We also find that disagreements due to ambiguity do not fit perfectly either category.
Funder
European Research Council
Reference49 articles.
1. AraBERT: transformer-based model for Arabic language understanding,;Antoun,2020
2. Inter-coder agreement for computational linguistics;Artstein;Comput. Linguist.,2008
3. We need to consider disagreement in evaluation,;Basile;BPPF,2021
4. Learning with annotation noise,;Beigman,2009
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献