Neighborhood Weighted Voting-Based Noise Correction for Crowdsourcing-Reference-Cited by-同舟云学术

Neighborhood Weighted Voting-Based Noise Correction for Crowdsourcing

Published:2023-04-14 Issue:7 Volume:17 Page:1-18
ISSN:1556-4681
Container-title:ACM Transactions on Knowledge Discovery from Data
language:en
Short-container-title:ACM Trans. Knowl. Discov. Data

Author:

Li Huiru¹^ORCID,Jiang Liangxiao¹^ORCID,Xue Siqing¹^ORCID

Affiliation:

1. China University of Geosciences, Wuhan, China

Abstract

In crowdsourcing scenarios, we can obtain each instance’s multiple noisy labels set from different crowd workers and then use a ground truth inference algorithm to infer its integrated label. Despite the effectiveness of ground truth inference algorithms, a certain level of noise still remains in the integrated labels. To reduce the impact of noise, many noise correction algorithms have been proposed in recent years. To the best of our knowledge, however, nearly all existing noise correction algorithms only exploit each instance’s own multiple noisy label sets but ignore the multiple noisy label sets of its neighbors. Here neighbors refer to the nearest instances found in the feature space based on the distance metric learning. In this article, we propose neighborhood weighted voting-based noise correction (NWVNC). In NWVNC, we at first take advantage of the multiple noisy label sets of each instance’s neighbors (including itself) to estimate the probability that it belongs to its integrated label. Then, we use the estimated probability to identify and filter noise instances and thus obtain a clean set and a noise set. Finally, we train three heterogeneous classifiers on the clean set and correct the noise instances by the consensus voting of three trained classifiers. The experimental results on 34 simulated and two real-world crowdsourced datasets show that NWVNC significantly outperforms all the other state-of-the-art noise correction algorithms used for comparison.

Funder

National Natural Science Foundation of China

Science and Technology Project of Hubei Province-Unveiling System

Industry-University-Research Innovation Funds for Chinese Universities

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3586998

Reference30 articles.

1. Carla E. Brodley and Mark A. Friedl. 1999. Identifying mislabeled training data. J. Artif. Intell. Res. 11 (1999) 131–167.

2. Label augmented and weighted majority voting for crowdsourcing

3. Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm

4. Gianluca Demartini, Djellel Eddine Difallah, and Philippe Cudré-Mauroux. 2012. ZenCrowd: Leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In Proceedings of the 21st World Wide Web Conference 2012, WWW 2012. Alain Mille, Fabien Gandon, Jacques Misselis, Michael Rabinovich, and Steffen Staab (Eds.), ACM, 469–478.

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Dynamic selection for reconstructing instance-dependent noisy labels;Pattern Recognition;2024-12

2. Crowdsourced Fact-checking: Does It Actually Work?;Information Processing & Management;2024-09

3. Instance redistribution-based label integration for crowdsourcing;Information Sciences;2024-07

4. Worker similarity-based noise correction for crowdsourcing;Information Systems;2024-03

5. Boosting Crowdsourced Annotation Accuracy: Small Loss Filtering and Augmentation-Driven Training;IEEE Access;2024