Supervised Contrast Learning Text Classification Model Based on Data Quality Augmentation-Reference-Cited by-同舟云学术

Supervised Contrast Learning Text Classification Model Based on Data Quality Augmentation

Published:2024-05-10 Issue:5 Volume:23 Page:1-12
ISSN:2375-4699
Container-title:ACM Transactions on Asian and Low-Resource Language Information Processing
language:en
Short-container-title:ACM Trans. Asian Low-Resour. Lang. Inf. Process.

Author:

Wu Liang¹^ORCID,Zhang Fangfang¹^ORCID,Cheng Chao¹^ORCID,Song Shinan²^ORCID

Affiliation:

1. Changchun University of Technology, Changchun, China

2. School of Computer Science and Engineering, Changchun University of Technology, Changchun University of Technology, Changchun, China

Abstract

Token-level data augmentation generates text samples by modifying the words of the sentences. However, data that are not easily classified can negatively affect the model. In particular, not considering the role of keywords when performing random augmentation operations on samples may lead to the generation of low-quality supplementary samples. Therefore, we propose a supervised contrast learning text classification model based on data quality augmentation. First, dynamic training is used to screen high-quality datasets containing beneficial information for model training. The selected data is then augmented with data based on important words with tag information. To obtain a better text representation to serve the downstream classification task, we employ a standard supervised contrast loss to train the model. Finally, we conduct experiments on five text classification datasets to validate the effectiveness of our model. In addition, ablation experiments are conducted to verify the impact of each module on classification.

Funder

Science and Technology Bureau of Changchun City

Jilin Province Development and Reform Commission

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3653300

Reference24 articles.

1. A large annotated corpus for learning natural language inference

2. YAKE! Keyword extraction from single documents using multiple local features

3. A holistic lexicon-based approach to opinion mining