ClassiNet -- Predicting Missing Features for Short-Text Classification-Reference-Cited by-同舟云学术

ClassiNet -- Predicting Missing Features for Short-Text Classification

Published:2018-10-31 Issue:5 Volume:12 Page:1-29
ISSN:1556-4681
Container-title:ACM Transactions on Knowledge Discovery from Data
language:en
Short-container-title:ACM Trans. Knowl. Discov. Data

Author:

Bollegala Danushka¹^ORCID,Atanasov Vincent¹,Maehara Takanori²,Kawarabayashi Ken-Ichi³

Affiliation:

1. University of Liverpool, United Kingdom

2. RIKEN Center for Advanced Intelligence Project, Tokyo, Japan

3. National Institute of Informatics, Tokyo, Japan

Abstract

Short and sparse texts such as tweets, search engine snippets, product reviews, and chat messages are abundant on the Web. Classifying such short-texts into a pre-defined set of categories is a common problem that arises in various contexts, such as sentiment classification, spam detection, and information recommendation. The fundamental problem in short-text classification is feature sparseness -- the lack of feature overlap between a trained model and a test instance to be classified. We propose ClassiNet -- a network of classifiers trained for predicting missing features in a given instance, to overcome the feature sparseness problem. Using a set of unlabeled training instances, we first learn binary classifiers as feature predictors for predicting whether a particular feature occurs in a given instance. Next, each feature predictor is represented as a vertex v i in the ClassiNet, where a one-to-one correspondence exists between feature predictors and vertices. The weight of the directed edge e ij connecting a vertex v i to a vertex v j represents the conditional probability that given v i exists in an instance, v j also exists in the same instance. We show that ClassiNets generalize word co-occurrence graphs by considering implicit co-occurrences between features. We extract numerous features from the trained ClassiNet to overcome feature sparseness. In particular, for a given instance x , we find similar features from ClassiNet that did not appear in x , and append those features in the representation of x . Moreover, we propose a method based on graph propagation to find features that are indirectly related to a given short-text. We evaluate ClassiNets on several benchmark datasets for short-text classification. Our experimental results show that by using ClassiNet, we can statistically significantly improve the accuracy in short-text classification tasks, without having to use any external resources such as thesauri for finding related features.

Funder

ERATO Kawarabayashi Large Graph Project from the Japan Science and Technology Agency

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3201578

Reference66 articles.

1. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions

2. Measuring semantic similarity between words using web search engines

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Experimenting Datasets and Machine Learning Techniques for Enhancing Cyberbullying Detection;2023 IEEE 11th Conference on Systems, Process & Control (ICSPC);2023-12-16

2. TextNetTopics Pro, a topic model-based text classification for short text by integration of semantic and document-topic distribution information;Frontiers in Genetics;2023-10-05

3. Innovative Research by Using IoT Applications on Cross-National English Cultural Communication Based on Crowdsourcing Translation Model;Wireless Communications and Mobile Computing;2022-08-21

4. In Search of Insight from Unstructured Text Data: Towards an Identification of Text Mining Techniques;Lecture Notes in Networks and Systems;2022

5. Application-Oriented Approach for Detecting Cyberaggression in Social Media;Advances in Intelligent Systems and Computing;2020-07-04