A Small-Sample Text Classification Model Based on Pseudo-Label Fusion Clustering Algorithm-Reference-Cited by-同舟云学术

A Small-Sample Text Classification Model Based on Pseudo-Label Fusion Clustering Algorithm

Published:2023-04-08 Issue:8 Volume:13 Page:4716
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Yang Linda¹,Huang Baohua¹^ORCID,Guo Shiqian¹,Lin Yunjie¹,Zhao Tong¹

Affiliation:

1. School of Computer, Electronic and Information, Guangxi University, Nanning 530004, China

Abstract

The problem of text classification has been a mainstream research branch in natural language processing, and how to improve the effect of classification under the scarcity of labeled samples is one of the hot issues in this direction. The current models supporting small-sample classification can learn knowledge and train models with a small number of labels, but the classification results are not satisfactory enough. In order to improve the classification accuracy, we propose a Small-sample Text Classification model based on the Pseudo-label fusion Clustering algorithm (STCPC). The algorithm includes two cores: (1) Mining the potential features of unlabeled data by using the training strategy of clustering assuming pseudo-labeling and then reducing the noise of the pseudo-labeled dataset by consistent training with its enhanced samples to improve the quality of the pseudo-labeled dataset. (2) The labeled data is augmented, and then the Easy Plug-in Data Augmentation (EPiDA) framework is used to balance the diversity and quality of the augmented samples to improve the richness of the labeled data reasonably. The results of comparison tests with other classical algorithms show that the STCPC model can effectively improve classification accuracy.

Funder

National Natural Science Foundation of China

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/13/8/4716/pdf

Reference40 articles.

1. Nasukawa, T., and Yi, J. (2003, January 23–25). Sentiment Analysis: Capturing Favorability Using Natural Language Processing. Proceedings of the 2nd International Conference on Knowledge Capture, Sanibel Island, FL, USA.

2. Ma, L., and Zhang, Y. (November, January 29). Using Word2Vec to Process Big Text Data. Proceedings of the 2015 IEEE International Conference on Big Data (Big Data), Santa Clara, CA, USA.

3. Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv.

4. Clark, K., Luong, M.-T., Le, Q.V., and Manning, C.D. (2020). ELECTRA: Pre-Training Text Encoders as Discriminators Rather than Generators. arXiv.

5. Yang, M. (2021, January 14–16). A Survey on Few-Shot Learning in Natural Language Processing. Proceedings of the 2021 International Conference on Artificial Intelligence and Electromechanical Automation (AIEA), Guangzhou, China.

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Semi-Supervised Training for (Pre-Stack) Seismic Data Analysis;Applied Sciences;2024-05-15

2. CLG: Contrastive Label Generation with Knowledge for Few-Shot Learning;Mathematics;2024-02-01

3. Pseudo-Labeling With Large Language Models for Multi-Label Emotion Classification of French Tweets;IEEE Access;2024