J-score: a robust measure of clustering accuracy-Reference-Cited by-同舟云学术

J-score: a robust measure of clustering accuracy

Published:2023-09-04 Issue: Volume:9 Page:e1545
ISSN:2376-5992
Container-title:PeerJ Computer Science
language:en
Short-container-title:

Author:

Ahmadinejad Navid,Chung Yunro,Liu Li

Abstract

Background Clustering analysis discovers hidden structures in a data set by partitioning them into disjoint clusters. Robust accuracy measures that evaluate the goodness of clustering results are critical for algorithm development and model diagnosis. Common problems of clustering accuracy measures include overlooking unmatched clusters, biases towards excessive clusters, unstable baselines, and difficulties of interpretation. In this study, we presented a novel accuracy measure, J-score, to address these issues. Methods Given a data set with known class labels, J-score quantifies how well the hypothetical clusters produced by clustering analysis recover the true classes. It starts with bidirectional set matching to identify the correspondence between true classes and hypothetical clusters based on Jaccard index. It then computes two weighted sums of Jaccard indices measuring the reconciliation from classes to clusters and vice versa. The final J-score is the harmonic mean of the two weighted sums. Results Through simulation studies and analyses of real data sets, we evaluated the performance of J-score and compared with existing measures. Our results show that J-score is effective in distinguishing partition structures that differ only by unmatched clusters, rewarding correct inference of class numbers, addressing biases towards excessive clusters, and having a relatively stable baseline. The simplicity of its calculation makes the interpretation straightforward. It is a valuable tool complementary to other accuracy measures. We released an R/jScore package implementing the algorithm.

Funder

National Institutes of Health of USA

Publisher

PeerJ

Subject

General Computer Science

Link

https://peerj.com/articles/cs-1545.pdf

Reference23 articles.

1. Comparative analysis of statistical pattern recognition methods in high dimensional settings;Aeberhard;Pattern Recognition,1994

2. J-Score: a robust measure of clustering accuracy;Ahmadinejad,2021

3. The application of unsupervised clustering methods to Alzheimer’s disease;Alashwal;Frontiers in Computational Neuroscience,2019

4. Is normalized mutual information a fair measure for comparing community detection methods?;Amelio,2015

5. The irises of the Gaspe Peninsula;Anderson;Bulletin American Iris Society,1935

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Clustering Functional Magnetic Resonance Imaging Time Series in Glioblastoma Characterization: A Review of the Evolution, Applications, and Potentials;Brain Sciences;2024-03-20