An enhanced cosine-based visual technique for the robust tweets data clustering-Reference-Cited by-同舟云学术

An enhanced cosine-based visual technique for the robust tweets data clustering

Published:2021-02-01 Issue:2 Volume:14 Page:170-184
ISSN:1756-378X
Container-title:International Journal of Intelligent Computing and Cybernetics
language:en
Short-container-title:IJICC

Author:

K Narasimhulu^ORCID,KT Meena Abarna,B Sivakumar

Abstract

PurposeThe purpose of the paper is to study multiple viewpoints which are required to access the more informative similarity features among the tweets documents, which is useful for achieving the robust tweets data clustering results.Design/methodology/approachLet “N” be the number of tweets documents for the topics extraction. Unwanted texts, punctuations and other symbols are removed, tokenization and stemming operations are performed in the initial tweets pre-processing step. Bag-of-features are determined for the tweets; later tweets are modelled with the obtained bag-of-features during the process of topics extraction. Approximation of topics features are extracted for every tweet document. These set of topics features of N documents are treated as multi-viewpoints. The key idea of the proposed work is to use multi-viewpoints in the similarity features computation. The following figure illustrates multi-viewpoints based cosine similarity computation of the five tweets documents (here N = 5) and corresponding documents are defined in projected space with five viewpoints, say, v1,v2, v3, v4, and v5. For example, similarity features between two documents (viewpoints v1, and v2) are computed concerning the other three multi-viewpoints (v3, v4, and v5), unlike a single viewpoint in traditional cosine metric.FindingsHealthcare problems with tweets data. Topic models play a crucial role in the classification of health-related tweets with finding topics (or health clusters) instead of finding term frequency and inverse document frequency (TF–IDF) for unlabelled tweets.Originality/valueTopic models play a crucial role in the classification of health-related tweets with finding topics (or health clusters) instead of finding TF-IDF for unlabelled tweets.

Publisher

Emerald

Subject

General Computer Science

Reference20 articles.

1. Is normalized mutual information a fair measure for comparing community detection methods?,2015

2. Cluster tendency methods for visualizing the data partitions;International Journal of Innovative Technology and Exploring Engineering,2019

3. VAT: a tool for visual assessment of (cluster) tendency,2002

4. SpecVAT: enhanced visual cluster analysis,2008

5. Latent Dirichlet allocation;Journal of Machine Learning Research,2003

Cited by 11 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An Efficient Pre-Clusters Assessment Technique for Efficient Data Partitions;2023 2nd International Conference on Edge Computing and Applications (ICECAA);2023-07-19

2. Sampling-based fuzzy speech clustering systems for faster communication with virtual robotics toward social applications;Soft Computing;2023-05-02

3. Develop extended visual methods for an effective clusters assessment of large datasets;AIP Conference Proceedings;2023

4. A novel multi-viewpoints based cosine similarity visual technique for an effective assessment of clustering tendency;i-manager’s Journal on Mathematics;2023

5. High performance social data computing with development of intelligent topic models for healthcare;Microprocessors and Microsystems;2022-11