Analysing the Noise Model Error for Realistic Noisy Label Data-Reference-Cited by-同舟云学术

Analysing the Noise Model Error for Realistic Noisy Label Data

Published:2021-05-18 Issue:9 Volume:35 Page:7675-7684
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Hedderich Michael A.,Zhu Dawei,Klakow Dietrich

Abstract

Distant and weak supervision allow to obtain large amounts of labeled training data quickly and cheaply, but these automatic annotations tend to contain a high amount of errors. A popular technique to overcome the negative effects of these noisy labels is noise modelling where the underlying noise process is modelled. In this work, we study the quality of these estimated noise models from the theoretical side by deriving the expected error of the noise model. Apart from evaluating the theoretical results on commonly used synthetic noise, we also publish NoisyNER, a new noisy label dataset from the NLP domain that was obtained through a realistic distant supervision technique. It provides seven sets of labels with differing noise patterns to evaluate different noise levels on the same instances. Parallel, clean labels are available making it possible to study scenarios where a small amount of gold-standard data can be leveraged. Our theoretical results and the corresponding experiments give insights into the factors that influence the noise model estimation like the noise distribution and the sampling technique.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. SPORT: A Subgraph Perspective on Graph Classification with Label Noise;ACM Transactions on Knowledge Discovery from Data;2024-08-28

2. ConstraintMatch for Semi-constrained Clustering;2023 International Joint Conference on Neural Networks (IJCNN);2023-06-18

3. Annotation Error Detection: Analyzing the Past and Present for a More Coherent Future;Computational Linguistics;2023