Affiliation:
1. Jawaharlal Nehru Technological University
Abstract
Data clustering is an unsupervised technique that can be used to partition the data into groups based on the similarities of the retrieved objects using different distance metrics like Euclidean, cosine, etc. In contrast to Euclidean, the cosine computes the object's similarity by considering both the magnitude and direction of the data vectors. As a result, it performed far better than a standard Euclidean distance metric in applications involving real-time data clustering. The initial k-value (clustering tendency) is required by top clustering techniques like k-means and hierarchical approaches to determine the clusters' quality. Users with knowledge can assign the k-value. However, sometimes the right k-value in such algorithms may need to be assigned. After a thorough review of the work, it was discovered that the visual technique known as visual assessment of (cluster) tendency (VAT) effectively addresses the clustering tendency issue. It uses the Euclidean metric to find the similarity features in its algorithm. Another enhanced visual technique, cosinebased VAT (cVAT), outperformed the VAT for text data and speech clustering applications. However, the similarity features are extracted about a single viewpoint in cVAT. This paper develops the multi-viewpoints-based cosine similarity measure (MVPCSM) for a more informative assessment. Instead of using a single reference point like a typical cosine measure, the MVPCSM generates precise similarity characteristics using several views. The performance of the existing and proposed technique (MVPCSM-VAT) is evaluated using clustering accuracy (CA) and normalized mutual information (NMI). It has been demonstrated that the proposed MVPCSM-VAT is 15-25% more efficient than VAT and cVAT in terms of the parameters of CA and NMI. The proposed method successfully obtains more quality data clusters than MVS-VAT.
Subject
Rehabilitation,Physical Therapy, Sports Therapy and Rehabilitation,General Medicine
Reference25 articles.
1. Foundations of Computational Intelligence Volume 4
2. Is Normalized Mutual Information a Fair Measure for Comparing Community Detection Methods?
3. Asuncion, A., & Newman, D. (2007). UCI Machine Learning Repository. Retrieved from https://ergodicity.net/2013/07/
4. Sampling-based visual assessment computing techniques for an efficient social data clustering
5. Basha, M.S., & Prasad, K. R. (2018). Efficient cluster tendency methods for discovering the number of clusters. ARPN
Journal of Engineering and Applied Sciences, 13(4), 1327-1334.