Online short text clustering using infinite extensions of discrete mixture models-Reference-Cited by-同舟云学术

Online short text clustering using infinite extensions of discrete mixture models

Published:2023-07-10 Issue:5 Volume:39 Page:759-782
ISSN:0824-7935
Container-title:Computational Intelligence
language:en
Short-container-title:Computational Intelligence

Author:

Hannachi Samar¹,Najar Fatma¹,Ennajari Hafsa¹,Bouguila Nizar¹

Affiliation:

1. Concordia Institute for Information and Systems Engineering (CIISE) Concordia University Montreal Quebec Canada

Abstract

AbstractShort text clustering is one of the fundamental tasks in natural language processing. Different from traditional documents, short texts are ambiguous and sparse due to their short form and the lack of recurrence in word usage from one text to another, making it very challenging to apply conventional machine learning algorithms directly. In this article, we propose two novel approaches for short texts clustering: collapsed Gibbs sampling infinite generalized Dirichlet multinomial mixture model infinite GSGDMM) and collapsed Gibbs sampling infinite Beta‐Liouville multinomial mixture model (infinite GSBLMM). We adopt two flexible and practical priors to the multinomial distribution where in the first one the generalized Dirichlet distribution is integrated, while the second one is based on the Beta‐Liouville distribution. We evaluate the proposed approaches on two famous benchmark datasets, namely, Google News and Tweet. The experimental results demonstrate the effectiveness of our models compared to basic approaches that use Dirichlet priors. We further propose to improve the performance of our methods with an online clustering procedure. We also evaluate the performance of our methods for the outlier detection task, in which we achieve accurate results.

Funder

Natural Sciences and Engineering Research Council of Canada

Publisher

Wiley

Subject

Artificial Intelligence,Computational Mathematics

Link

https://onlinelibrary.wiley.com/doi/pdf/10.1111/coin.12593

Reference33 articles.

1. Short text clustering; challenges & solutions: a literature review;Siddiqui T;Int J Math Comput Res,2015

2. Exact fisher information of generalized Dirichlet multinomial distribution for count data modeling

3. Spatial Color Image Databases Summarization

4. A Model-Based Approach for Discrete Data Clustering and Feature Weighting Using MAP and Stochastic Complexity

5. On smoothing and scaling language model for sentiment based information retrieval;Najar F;Adv Data Anal Classif,2022

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Unsupervised clustering-based domain adaptation for estimating occupancy and recognizing activities in smart buildings;Journal of Building Engineering;2024-05