Witan-Reference-Cited by-同舟云学术

Witan

Published:2022-07 Issue:11 Volume:15 Page:2334-2347
ISSN:2150-8097
Container-title:Proceedings of the VLDB Endowment
language:en
Short-container-title:Proc. VLDB Endow.

Author:

Denham Benjamin¹,Lai Edmund M-K.¹,Sinha Roopak¹,Naeem M. Asif²

Affiliation:

1. Auckland University of Technology, Auckland, New Zealand

2. National University of Computer & Emerging Sciences, Islamabad, Pakistan

Abstract

Effective supervised training of modern machine learning models often requires large labelled training datasets, which could be prohibitively costly to acquire for many practical applications. Research addressing this problem has sought ways to leverage weak supervision sources, such as the user-defined heuristic labelling functions used in the data programming paradigm, which are cheaper and easier to acquire. Automatic generation of these functions can make data programming even more efficient and effective. However, existing approaches rely on initial supervision in the form of small labelled datasets or interactive user feedback. In this paper, we propose Witan, an algorithm for generating labelling functions without any initial supervision. This flexibility affords many interaction modes, including unsupervised dataset exploration before the user even defines a set of classes. Experiments in binary and multi-class classification demonstrate the efficiency and classification accuracy of Witan compared to alternative labelling approaches.

Publisher

Association for Computing Machinery (ACM)

Subject

General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development

Link

https://dl.acm.org/doi/pdf/10.14778/3551793.3551797

Reference59 articles.

1. Detecting opinion spams and fake news using text classification

2. Contributions to the study of SMS spam filtering

3. Pedro Alonso Doval . 2021. Strategies for the programmatic generation of labelled corpus for text classification. Master's thesis . University of Vigo . Pedro Alonso Doval. 2021. Strategies for the programmatic generation of labelled corpus for text classification. Master's thesis. University of Vigo.

4. Automatic Labeling of Tweets for Crisis Response Using Distant Supervision

5. Chidubem Arachie and Bert Huang. 2021. Constrained labeling for weakly supervised learning. In Uncertainty in Artificial Intelligence. PMLR 236--246. Chidubem Arachie and Bert Huang. 2021. Constrained labeling for weakly supervised learning. In Uncertainty in Artificial Intelligence. PMLR 236--246.

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Automating Weak Label Generation for Data Programming with Clinicians in the Loop;2024 IEEE/ACM Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE);2024-06-19

2. Applications and Challenges for Large Language Models: From Data Management Perspective;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13

3. Steered Training Data Generation for Learned Semantic Type Detection;Proceedings of the ACM on Management of Data;2023-06-13