Affiliation:
1. Auckland University of Technology, Auckland, New Zealand
2. National University of Computer & Emerging Sciences, Islamabad, Pakistan
Abstract
Effective supervised training of modern machine learning models often requires large labelled training datasets, which could be prohibitively costly to acquire for many practical applications. Research addressing this problem has sought ways to leverage
weak supervision
sources, such as the user-defined heuristic labelling functions used in the
data programming
paradigm, which are cheaper and easier to acquire. Automatic generation of these functions can make data programming even more efficient and effective. However, existing approaches rely on initial supervision in the form of small labelled datasets or interactive user feedback. In this paper, we propose Witan, an algorithm for generating labelling functions without any initial supervision. This flexibility affords many interaction modes, including unsupervised dataset exploration before the user even defines a set of classes. Experiments in binary and multi-class classification demonstrate the efficiency and classification accuracy of Witan compared to alternative labelling approaches.
Publisher
Association for Computing Machinery (ACM)
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Reference59 articles.
1. Detecting opinion spams and fake news using text classification
2. Contributions to the study of SMS spam filtering
3. Pedro Alonso Doval . 2021. Strategies for the programmatic generation of labelled corpus for text classification. Master's thesis . University of Vigo . Pedro Alonso Doval. 2021. Strategies for the programmatic generation of labelled corpus for text classification. Master's thesis. University of Vigo.
4. Automatic Labeling of Tweets for Crisis Response Using Distant Supervision
5. Chidubem Arachie and Bert Huang. 2021. Constrained labeling for weakly supervised learning. In Uncertainty in Artificial Intelligence. PMLR 236--246. Chidubem Arachie and Bert Huang. 2021. Constrained labeling for weakly supervised learning. In Uncertainty in Artificial Intelligence. PMLR 236--246.
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Automating Weak Label Generation for Data Programming with Clinicians in the Loop;2024 IEEE/ACM Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE);2024-06-19
2. Applications and Challenges for Large Language Models: From Data Management Perspective;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13
3. Steered Training Data Generation for Learned Semantic Type Detection;Proceedings of the ACM on Management of Data;2023-06-13