Affiliation:
1. China University of Geosciences, Wuhan, China
Abstract
In crowdsourcing scenarios, we can obtain each instance’s multiple noisy labels set from different crowd workers and then use a ground truth inference algorithm to infer its integrated label. Despite the effectiveness of ground truth inference algorithms, a certain level of noise still remains in the integrated labels. To reduce the impact of noise, many noise correction algorithms have been proposed in recent years. To the best of our knowledge, however, nearly all existing noise correction algorithms only exploit each instance’s own multiple noisy label sets but ignore the multiple noisy label sets of its neighbors. Here neighbors refer to the nearest instances found in the feature space based on the distance metric learning. In this article, we propose neighborhood weighted voting-based noise correction (NWVNC). In NWVNC, we at first take advantage of the multiple noisy label sets of each instance’s neighbors (including itself) to estimate the probability that it belongs to its integrated label. Then, we use the estimated probability to identify and filter noise instances and thus obtain a clean set and a noise set. Finally, we train three heterogeneous classifiers on the clean set and correct the noise instances by the consensus voting of three trained classifiers. The experimental results on 34 simulated and two real-world crowdsourced datasets show that NWVNC significantly outperforms all the other state-of-the-art noise correction algorithms used for comparison.
Funder
National Natural Science Foundation of China
Science and Technology Project of Hubei Province-Unveiling System
Industry-University-Research Innovation Funds for Chinese Universities
Publisher
Association for Computing Machinery (ACM)
Reference30 articles.
1. Carla E. Brodley and Mark A. Friedl. 1999. Identifying mislabeled training data. J. Artif. Intell. Res. 11 (1999) 131–167.
2. Label augmented and weighted majority voting for crowdsourcing
3. Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm
4. Gianluca Demartini, Djellel Eddine Difallah, and Philippe Cudré-Mauroux. 2012. ZenCrowd: Leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In Proceedings of the 21st World Wide Web Conference 2012, WWW 2012. Alain Mille, Fabien Gandon, Jacques Misselis, Michael Rabinovich, and Steffen Staab (Eds.), ACM, 469–478.
Cited by
9 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献