Active Learning with Query Generation for Cost-Effective Text Classification-Reference-Cited by-同舟云学术

Active Learning with Query Generation for Cost-Effective Text Classification

Published:2020-04-03 Issue:04 Volume:34 Page:6583-6590
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Yan Yi-Fan,Huang Sheng-Jun,Chen Shaoyi,Liao Meng,Xu Jin

Abstract

Labeling a text document is usually time consuming because it requires the annotator to read the whole document and check its relevance with each possible class label. It thus becomes rather expensive to train an effective model for text classification when it involves a large dataset of long documents. In this paper, we propose an active learning approach for text classification with lower annotation cost. Instead of scanning all the examples in the unlabeled data pool to select the best one for query, the proposed method automatically generates the most informative examples based on the classification model, and thus can be applied to tasks with large scale or even infinite unlabeled data. Furthermore, we propose to approximate the generated example with a few summary words by sparse reconstruction, which allows the annotators to easily assign the class label by reading a few words rather than the long document. Experiments on different datasets demonstrate that the proposed approach can effectively improve the classification performance while significantly reduce the annotation cost.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Imbalanced COVID-19 vaccine sentiment classification with synthetic resampling coupled deep adversarial active learning;Machine Learning;2024-07-15

2. Data labeling through the centralities of co-reference networks improves the classification accuracy of scientific papers;Journal of Informetrics;2024-05

3. SwiftTheft: A Time-Efficient Model Extraction Attack Framework Against Cloud-Based Deep Neural Networks;Chinese Journal of Electronics;2024-01

4. On Label Quality in Class Imbalance Setting -A Case Study;2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA);2022-12

5. An Efficient Framework for Constructing Speech Emotion Corpus Based on Integrated Active Learning Strategies;IEEE Transactions on Affective Computing;2022-10-01