Genome-scale cluster analysis of replicated microarrays using shrinkage correlation coefficient-Reference-Cited by-同舟云学术

Genome-scale cluster analysis of replicated microarrays using shrinkage correlation coefficient

Published:2008-06-18 Issue:1 Volume:9 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Yao Jianchao,Chang Chunqi,Salmi Mari L,Hung Yeung Sam,Loraine Ann,Roux Stanley J

Abstract

Abstract Background Currently, clustering with some form of correlation coefficient as the gene similarity metric has become a popular method for profiling genomic data. The Pearson correlation coefficient and the standard deviation (SD)-weighted correlation coefficient are the two most widely-used correlations as the similarity metrics in clustering microarray data. However, these two correlations are not optimal for analyzing replicated microarray data generated by most laboratories. An effective correlation coefficient is needed to provide statistically sufficient analysis of replicated microarray data. Results In this study, we describe a novel correlation coefficient, shrinkage correlation coefficient (SCC), that fully exploits the similarity between the replicated microarray experimental samples. The methodology considers both the number of replicates and the variance within each experimental group in clustering expression data, and provides a robust statistical estimation of the error of replicated microarray data. The value of SCC is revealed by its comparison with two other correlation coefficients that are currently the most widely-used (Pearson correlation coefficient and SD-weighted correlation coefficient) using statistical measures on both synthetic expression data as well as real gene expression data from Saccharomyces cerevisiae. Two leading clustering methods, hierarchical and k-means clustering were applied for the comparison. The comparison indicated that using SCC achieves better clustering performance. Applying SCC-based hierarchical clustering to the replicated microarray data obtained from germinating spores of the fern Ceratopteris richardii, we discovered two clusters of genes with shared expression patterns during spore germination. Functional analysis suggested that some of the genetic mechanisms that control germination in such diverse plant lineages as mosses and angiosperms are also conserved among ferns. Conclusion This study shows that SCC is an alternative to the Pearson correlation coefficient and the SD-weighted correlation coefficient, and is particularly useful for clustering replicated microarray data. This computational approach should be generally useful for proteomic data or other high-throughput analysis methodology.

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

https://link.springer.com/content/pdf/10.1186/1471-2105-9-288.pdf

Reference58 articles.

1. Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns. PNAS 1998, 95(25):14863–14868. 10.1073/pnas.95.25.14863

2. Kung C, Kenski DM, Dickerson SH, Howson RW, Kuyper LF, Madhani HD, Shokat KM: Chemical genomic profiling to identify intracellular targets of a multiplex kinase inhibitor. PNAS 2005, 102(10):3587–3592. 10.1073/pnas.0407170102

3. Matsumura H, Bin Nasir KH, Yoshida K, Ito A, Kahl G, Kruger DH, Terauchi R: SuperSAGE array: the direct use of 26-base-pair transcript tags in oligonucleotide arrays. Nature Methods 2006, 3(6):469–474. 10.1038/nmeth882

4. Rengarajan J, Bloom BR, Rubin EJ: From The Cover: Genome-wide requirements for Mycobacterium tuberculosis adaptation and survival in macrophages. PNAS 2005, 102(23):8327–8332. 10.1073/pnas.0503272102

5. Hughes TR, Marton MJ, Jones AR, al : Functional discovery via a compendium of expression profiles. Cell 2000, 102(1):109–126. 10.1016/S0092-8674(00)00015-5

Cited by 23 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Secure similar patients query with homomorphically evaluated thresholds;Journal of Information Security and Applications;2024-09

2. Identifying local associations in biological time series: algorithms, statistical significance, and applications;Briefings in Bioinformatics;2023-09-22

3. Iterative sure independent ranking and screening for drug response prediction;BMC Medical Informatics and Decision Making;2020-09

4. Cluster analysis of replicated alternative polyadenylation data using canonical correlation analysis;BMC Genomics;2019-01-22

5. Privacy-Preserving Similar Patient Queries for Combined Biomedical Data;Proceedings on Privacy Enhancing Technologies;2018-12-24