A Survey on Text Classification Algorithms: From Text to Predictions-Reference-Cited by-同舟云学术

A Survey on Text Classification Algorithms: From Text to Predictions

Published:2022-02-11 Issue:2 Volume:13 Page:83
ISSN:2078-2489
Container-title:Information
language:en
Short-container-title:Information

Author:

Gasparetto Andrea^ORCID,Marcuzzo Matteo^ORCID,Zangari Alessandro^ORCID,Albarelli Andrea^ORCID

Abstract

In recent years, the exponential growth of digital documents has been met by rapid progress in text classification techniques. Newly proposed machine learning algorithms leverage the latest advancements in deep learning methods, allowing for the automatic extraction of expressive features. The swift development of these methods has led to a plethora of strategies to encode natural language into machine-interpretable data. The latest language modelling algorithms are used in conjunction with ad hoc preprocessing procedures, of which the description is often omitted in favour of a more detailed explanation of the classification step. This paper offers a concise review of recent text classification models, with emphasis on the flow of data, from raw text to output labels. We highlight the differences between earlier methods and more recent, deep learning-based methods in both their functioning and in how they transform input data. To give a better perspective on the text classification landscape, we provide an overview of datasets for the English language, as well as supplying instructions for the synthesis of two new multilabel datasets, which we found to be particularly scarce in this setting. Finally, we provide an outline of new experimental results and discuss the open research challenges posed by deep learning-based language models.

Publisher

MDPI AG

Subject

Information Systems

Link

https://www.mdpi.com/2078-2489/13/2/83/pdf

Reference166 articles.

1. A Survey on Text Classification: From Shallow to Deep Learning;Li;arXiv,2020

2. Text Classification Algorithms: A Survey

3. Deep Learning--based Text Classification

4. Generating Sequences With Recurrent Neural Networks;Graves;arXiv,2013

5. Introduction to Information Retrieval

Cited by 71 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A comprehensive survey of text classification techniques and their research applications: Observational and experimental insights;Computer Science Review;2024-11

2. Large language multimodal models for new-onset type 2 diabetes prediction using five-year cohort electronic health records;Scientific Reports;2024-09-06

3. Social Media Topic Classification on Greek Reddit;Information;2024-08-26

4. Cascaded cross-modal transformer for audio–textual classification;Artificial Intelligence Review;2024-08-02

5. Sentiment Analysis of YouTube Users on Blackpink Kpop Group Using IndoBERT;INTENSIF: Jurnal Ilmiah Penelitian dan Penerapan Teknologi Sistem Informasi;2024-08-01