Short text classification using semantically enriched topic model-Reference-Cited by-同舟云学术

Short text classification using semantically enriched topic model

Published:2024-03-20 Issue: Volume: Page:
ISSN:0165-5515
Container-title:Journal of Information Science
language:en
Short-container-title:Journal of Information Science

Author:

Uddin Farid¹^ORCID,Chen Yibo²,Zhang Zuping¹^ORCID,Huang Xin²

Affiliation:

1. School of Computer Science and Engineering, Central South University, China

2. Information and Communication Branch, State Grid Hunan Electric Power Company Limited, China

Abstract

Modelling short text is challenging due to the small number of word co-occurrence and insufficient semantic information that affects downstream Natural Language Processing (NLP) tasks, for example, text classification. Gathering information from external sources is expensive and may increase noise. For efficient short text classification without depending on external knowledge sources, we propose Expressive Short text Classification (EStC). EStC consists of a novel document context-aware semantically enriched topic model called the Short text Topic Model (StTM) that captures words, topics and documents semantics in a joint learning framework. In StTM, the probability of predicting a context word involves the topic distribution of word embeddings and the document vector as the global context, which obtains by weighted averaging of word embeddings on the fly simultaneously with the topic distribution of words without requiring an additional inference method for the document embedding. EStC represents documents in an expressive (number of topics × number of word embedding features) embedding space and uses a linear support vector machine (SVM) classifier for their classification. Experimental results demonstrate that EStC outperforms many state-of-the-art language models in short text classification using several publicly available short text data sets.

Funder

Hunan Key Laboratory for Internet of Things in Electricity

National Natural Science Foundation of China

National Natural Science Foundation of Hunan Province

project about research on key technologies of power knowledge graph

Publisher

SAGE Publications

Link

https://journals.sagepub.com/doi/pdf/10.1177/01655515241230793

Reference58 articles.

1. A novel cause analysis approach of grey reasoning Petri net based on matrix operations

2. Classification of news-related tweets

3. SACPC: A framework based on probabilistic linguistic terms for short text sentiment analysis

4. Verbal aggression detection on Twitter comments: convolutional neural network for short-text sentiment analysis