A Survey on Text Classification: From Traditional to Deep Learning-Reference-Cited by-同舟云学术

A Survey on Text Classification: From Traditional to Deep Learning

Published:2022-04-08 Issue:2 Volume:13 Page:1-41
ISSN:2157-6904
Container-title:ACM Transactions on Intelligent Systems and Technology
language:en
Short-container-title:ACM Trans. Intell. Syst. Technol.

Author:

Li Qian¹^ORCID,Peng Hao¹^ORCID,Li Jianxin¹^ORCID,Xia Congying²^ORCID,Yang Renyu³^ORCID,Sun Lichao⁴^ORCID,Yu Philip S.²^ORCID,He Lifang⁴^ORCID

Affiliation:

1. Beihang University, Haidian district, Beijing, China

2. University of Illinois at Chicago, Chicago, IL, USA

3. University of Leeds, Leeds, England, UK

4. Lehigh University, Bethlehem, PA, USA

Abstract

Text classification is the most fundamental and essential task in natural language processing. The last decade has seen a surge of research in this area due to the unprecedented success of deep learning. Numerous methods, datasets, and evaluation metrics have been proposed in the literature, raising the need for a comprehensive and updated survey. This paper fills the gap by reviewing the state-of-the-art approaches from 1961 to 2021, focusing on models from traditional models to deep learning. We create a taxonomy for text classification according to the text involved and the models used for feature extraction and classification. We then discuss each of these categories in detail, dealing with both the technical developments and benchmark datasets that support tests of predictions. A comprehensive comparison between different techniques, as well as identifying the pros and cons of various evaluation metrics are also provided in this survey. Finally, we conclude by summarizing key implications, future research directions, and the challenges facing the research area.

Funder

National Key R&D Program of China

NSFC

State Key Laboratory of Software Development Environment

NSF

NSF ONR

Lehigh’s accelerator

CAAI-Huawei MindSpore Open Fund

Publisher

Association for Computing Machinery (ACM)

Subject

Artificial Intelligence,Theoretical Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3495162

Reference305 articles.

1. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks

2. Xiao-Dan Zhu, Parinaz Sobhani, and Hongyu Guo. 2015. Long short-term memory over recursive structures. In Proc. ICML, 2015. 1604–1612. http://proceedings.mlr.press/v37/zhub15.html.

3. A Dirichlet process biterm-based mixture model for short text stream clustering

4. A nonparametric model for online topic discovery with word embeddings

5. A Convolutional Neural Network for Modelling Sentences

Cited by 166 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Application of NLP-based models in automated detection of risky contract statements written in complex script system;Expert Systems with Applications;2025-01

2. Multi-schema prompting powered token-feature woven attention network for short text classification;Pattern Recognition;2024-12

3. A comprehensive survey of text classification techniques and their research applications: Observational and experimental insights;Computer Science Review;2024-11

4. Comprehensive review and comparative analysis of transformer models in sentiment analysis;Knowledge and Information Systems;2024-09-06

5. Improving text classification through pre-attention mechanism-derived lexicons;Applied Intelligence;2024-09-02