Abstract
AbstractUnsupervised clustering is an important task in biomedical science. We developed a new clustering method, called SillyPutty, for unsupervised clustering. As test data, we generated a series of datasets using the Umpire R package. Using these datasets, we compared SillyPutty to several existing algorithms using multiple metrics (Silhouette Width, Adjusted Rand Index, Entropy, Normalized Within-group Sum of Square errors, and Perfect Classification Count). Our findings revealed that SillyPutty is a valid standalone clustering method, comparable in accuracy to the best existing methods. We also found that the combination of hierarchical clustering followed by SillyPutty has the best overall performance in terms of both accuracy and speed.AvailabilityThe SillyPutty R package has been submitted to the Comprehensive R Archive Network (CRAN). Code to perform and analyze the simulations described here can be found in a Git project hosted athttps://gitlab.com/krcoombes/sillyputty.
Publisher
Cold Spring Harbor Laboratory
Reference24 articles.
1. optCluster: An R Package for Determining the Optimal Clustering Algorithm
2. Computational cluster validation in post-genomic data analysis
3. The best clustering algorithms in data mining
4. Hierarchical Grouping to Optimize an Objective Function
5. MacQueen J. Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics. University of California Press; 1967. pp. 281–298.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献