Visualization and performance measure to determine number of topics in twitter data clustering using hybrid topic modeling

Author:

Noorullah R.M.1,Mohammed Moulana1

Affiliation:

1. Department of CSE, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur, Andhra Pradesh, India

Abstract

Topic models are widely used in building clusters of documents for more than a decade, yet problems occurring in choosing the optimal number of topics. The main problem is the lack of a stable metric of the quality of topics obtained during the construction of topic models. The authors analyzed from previous works, most of the models used in determining the number of topics are non-parametric and the quality of topics determined by using perplexity and coherence measures and concluded that they are not applicable in solving this problem. In this paper, we used the parametric method, which is an extension of the traditional topic model with visual access tendency for visualization of the number of topics (clusters) to complement clustering and to choose the optimal number of topics based on results of cluster validity indices. Developed hybrid topic models are demonstrated with different Twitter datasets on various topics in obtaining the optimal number of topics and in measuring the quality of clusters. The experimental results showed that the Visual Non-negative Matrix Factorization (VNMF) topic model performs well in determining the optimal number of topics with interactive visualization and in performance measure of the quality of clusters with validity indices.

Publisher

IOS Press

Subject

Artificial Intelligence,General Engineering,Statistics and Probability

Cited by 6 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Detection and Prediction of Liver Disease Using Interpolative Ensembled Stacked (IES) Hybrid Model (Random Forest Classifier (RFC) and Logistic Regression (LR));2024 International Conference on Expert Clouds and Applications (ICOECA);2024-04-18

2. Social Network Data Clustering, Optimal Topic Detection and Decision Making for Social Service;2024 International Conference on Expert Clouds and Applications (ICOECA);2024-04-18

3. Analysis of Spammer Reporting Techniques on Online Social Networks;2023 International Conference on Evolutionary Algorithms and Soft Computing Techniques (EASCT);2023-10-20

4. Predicting User Likes in Online Shopping Using Artificial Intelligence and Big Data Analytics;2023 4th International Conference on Smart Electronics and Communication (ICOSEC);2023-09-20

5. Sampling-based fuzzy speech clustering systems for faster communication with virtual robotics toward social applications;Soft Computing;2023-05-02

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3