Abstract
Motivation
Gene co-expression analysis is an attractive tool for leveraging enormous amounts of public RNA-seq datasets for the prediction of gene functions and regulatory mechanisms. However, the optimal data processing steps for the accurate prediction of gene co-expression from such large datasets remain unclear. Especially the importance of batch effect correction is understudied.
Results
We processed RNA-seq data of 68 human and 76 mouse cell types and tissues using 50 different workflows into 7,200 genome-wide gene co-expression networks. We then conducted a systematic analysis of the factors that result in high-quality co-expression predictions, focusing on normalization, batch effect correction, and measure of correlation. We confirmed the key importance of high sample counts for high-quality predictions. However, choosing a suitable normalization approach and applying batch effect correction can further improve the quality of co-expression estimates, equivalent to a >80% and >40% increase in samples. In larger datasets, batch effect removal was equivalent to a more than doubling of the sample size. Finally, Pearson correlation appears more suitable than Spearman correlation, except for smaller datasets.
Conclusion
A key point for accurate prediction of gene co-expression is the collection of many samples. However, paying attention to data normalization, batch effects, and the measure of correlation can significantly improve the quality of co-expression estimates.
Funder
Japan Society for the Promotion of Science
Publisher
Public Library of Science (PLoS)
Reference33 articles.
1. Cluster analysis and display of genome-wide expression patterns.;MB Eisen;Proc Natl Acad Sci U S A,1998
2. Systematic survey reveals general applicability of “guilt-by-association” within gene coexpression networks.;CJ Wolfe;BMC Bioinformatics,2005
3. A general framework for weighted gene co-expression network analysis;B Zhang;Stat Appl Genet Mol Biol,2005
4. Co-expression tools for plant biology: opportunities for hypothesis generation and caveats;B Usadel;Plant Cell Environ,2009
5. Learning from co-expression networks: Possibilities and challenges;EAR Serin;Front Plant Sci,2016
Cited by
10 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献