Too much data, but little inter-changeability: a lesson learned from mining public data on tissue specificity of gene expression
-
Published:2006-10-25
Issue:1
Volume:1
Page:
-
ISSN:1745-6150
-
Container-title:Biology Direct
-
language:en
-
Short-container-title:Biol Direct
Author:
Li Shuyu,Li Yiqun Helen,Wei Tao,Su Eric Wen,Duffin Kevin,Liao Birong
Abstract
Abstract
Background
The tissue expression pattern of a gene often provides an important clue to its potential role in a biological process. A vast amount of gene expression data have been and are being accumulated in public repository through different technology platforms. However, exploitations of these rich data sources remain limited in part due to issues of technology standardization. Our objective is to test the data comparability between SAGE and microarray technologies, through examining the expression pattern of genes under normal physiological states across variety of tissues.
Results
There are 42–54% of genes showing significant correlations in tissue expression patterns between SAGE and GeneChip, with 30–40% of genes whose expression patterns are positively correlated and 10–15% of genes whose expression patterns are negatively correlated at a statistically significant level (p = 0.05). Our analysis suggests that the discrepancy on the expression patterns derived from technology platforms is not likely from the heterogeneity of tissues used in these technologies, or other spurious correlations resulting from microarray probe design, abundance of genes, or gene function. The discrepancy can be partially explained by errors in the original assignment of SAGE tags to genes due to the evolution of sequence databases. In addition, sequence analysis has indicated that many SAGE tags and Affymetrix array probe sets are mapped to different splice variants or different sequence regions although they represent the same gene, which also contributes to the observed discrepancies between SAGE and array expression data.
Conclusion
To our knowledge, this is the first report attempting to mine gene expression patterns across tissues using public data from different technology platforms. Unlike previous similar studies that only demonstrated the discrepancies between the two gene expression platforms, we carried out in-depth analysis to further investigate the cause for such discrepancies. Our study shows that the exploitation of rich public expression resource requires extensive knowledge about the technologies, and experiment. Informatic methodologies for better interoperability among platforms still remain a gap. One of the areas that can be improved practically is the accurate sequence mapping of SAGE tags and array probes to full-length genes.
Reviewers
This article was reviewed by Dr. I. King Jordan, Dr. Joel Bader, and Dr. Arcady Mushegian.
Publisher
Springer Science and Business Media LLC
Subject
Applied Mathematics,General Agricultural and Biological Sciences,General Biochemistry, Genetics and Molecular Biology,Modeling and Simulation,Ecology, Evolution, Behavior and Systematics,Immunology
Reference14 articles.
1. Barrett T, Suzek TO, Troup DB, Wilhite SE, Ngau WC, Ledoux P, Rudnev D, Lash AE, Fujibuchi W, Edgar R: NCBI GEO: mining millions of expression profiles--database and tools. Nucleic Acids Res 2005, 33 Database Issue: D562-D566. 2. Ventura B: Mandatory submission of microarray data to public repositories: how is is working. Physiol Genomics 2005, 20: 153-156. 10.1152/physiolgenomics.00264.2004 3. Mootha VK, Lepage P, Miller K, Bunkenborg J, Reich M, Hjerrild M, Delmonte T, Villeneuve A, Sladek R, Xu F, Mitchell GA, Morin C, Mann M, Hudson TJ, Robinson B, Rioux JD, Lander ES: Identification of a gene causing human cytochrome c oxidase deficiency by integrative genomics. Proc Natl Acad Sci U S A 2003, 100: 605-610. 10.1073/pnas.242716699 4. Lockhart DJ, Dong H, Byrne MC, Follettie MT, Gallo MV, Chee MS, Mittmann M, Wang C, Kobayashi M, Horton H, Brown EL: Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat Biotechnol 1996, 14: 1675-1680. 10.1038/nbt1296-1675 5. Velculescu VE, Zhang L, Vogelstein B, Kinzler KW: Serial analysis of gene expression. Science 1995, 270: 484-487.
Cited by
9 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|