Author:
Priness Ido,Maimon Oded,Ben-Gal Irad
Abstract
Abstract
Background
The definition of a distance measure plays a key role in the evaluation of different clustering solutions of gene expression profiles. In this empirical study we compare different clustering solutions when using the Mutual Information (MI) measure versus the use of the well known Euclidean distance and Pearson correlation coefficient.
Results
Relying on several public gene expression datasets, we evaluate the homogeneity and separation scores of different clustering solutions. It was found that the use of the MI measure yields a more significant differentiation among erroneous clustering solutions. The proposed measure was also used to analyze the performance of several known clustering algorithms. A comparative study of these algorithms reveals that their "best solutions" are ranked almost oppositely when using different distance measures, despite the found correspondence between these measures when analysing the averaged scores of groups of solutions.
Conclusion
In view of the results, further attention should be paid to the selection of a proper distance measure for analyzing the clustering of gene expression data.
Publisher
Springer Science and Business Media LLC
Subject
Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology
Reference56 articles.
1. Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome wide expression patterns. Proceedings of the National Academy of Sciences USA 1998, 95: 14863–14868. 10.1073/pnas.95.25.14863
2. Ben-Dor A, Shamir R, Yakhini Z: Clustering gene expression patterns. Journal of Computational Biology 1999, 6: 281–297. 10.1089/106652799318274
3. Alon U, Barkai N, Notterman D, Gish K, Ybarra S, Mack D, Levine AJ: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences USA 1999, 96: 6745–6750. 10.1073/pnas.96.12.6745
4. Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM: Systematic determination of genetic network architecture. Nature genetics 1999, 22: 281–285. 10.1038/10343
5. Shamir R, Sharan R: Algorithmic approaches to clustering gene expression data. In Current Topics in Computational Biology. Edited by: Tao J, Ying X, Michael QZ. MIT Press; 2002.
Cited by
123 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献