A novel multi-viewpoints based cosine similarity visual technique for an effective assessment of clustering tendency

Author:

Rajasekhar Pinisetty1,Ravindranath Vandrangi1

Affiliation:

1. Jawaharlal Nehru Technological University

Abstract

Data clustering is an unsupervised technique that can be used to partition the data into groups based on the similarities of the retrieved objects using different distance metrics like Euclidean, cosine, etc. In contrast to Euclidean, the cosine computes the object's similarity by considering both the magnitude and direction of the data vectors. As a result, it performed far better than a standard Euclidean distance metric in applications involving real-time data clustering. The initial k-value (clustering tendency) is required by top clustering techniques like k-means and hierarchical approaches to determine the clusters' quality. Users with knowledge can assign the k-value. However, sometimes the right k-value in such algorithms may need to be assigned. After a thorough review of the work, it was discovered that the visual technique known as visual assessment of (cluster) tendency (VAT) effectively addresses the clustering tendency issue. It uses the Euclidean metric to find the similarity features in its algorithm. Another enhanced visual technique, cosinebased VAT (cVAT), outperformed the VAT for text data and speech clustering applications. However, the similarity features are extracted about a single viewpoint in cVAT. This paper develops the multi-viewpoints-based cosine similarity measure (MVPCSM) for a more informative assessment. Instead of using a single reference point like a typical cosine measure, the MVPCSM generates precise similarity characteristics using several views. The performance of the existing and proposed technique (MVPCSM-VAT) is evaluated using clustering accuracy (CA) and normalized mutual information (NMI). It has been demonstrated that the proposed MVPCSM-VAT is 15-25% more efficient than VAT and cVAT in terms of the parameters of CA and NMI. The proposed method successfully obtains more quality data clusters than MVS-VAT.

Publisher

i-manager Publications

Subject

Rehabilitation,Physical Therapy, Sports Therapy and Rehabilitation,General Medicine

Reference25 articles.

1. Foundations of Computational Intelligence Volume 4

2. Is Normalized Mutual Information a Fair Measure for Comparing Community Detection Methods?

3. Asuncion, A., & Newman, D. (2007). UCI Machine Learning Repository. Retrieved from https://ergodicity.net/2013/07/

4. Sampling-based visual assessment computing techniques for an efficient social data clustering

5. Basha, M.S., & Prasad, K. R. (2018). Efficient cluster tendency methods for discovering the number of clusters. ARPN Journal of Engineering and Applied Sciences, 13(4), 1327-1334.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3