Validation and characterization of DNA microarray gene expression data distribution and associated moments-Reference-Cited by-同舟云学术

Validation and characterization of DNA microarray gene expression data distribution and associated moments

Published:2010-11-24 Issue:1 Volume:11 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Thomas Reuben,de la Torre Luis,Chang Xiaoqing,Mehrotra Sanjay

Abstract

Abstract Background The data from DNA microarrays are increasingly being used in order to understand effects of different conditions, exposures or diseases on the modulation of the expression of various genes in a biological system. This knowledge is then further used in order to generate molecular mechanistic hypotheses for an organism when it is exposed to different conditions. Several different methods have been proposed to analyze these data under different distributional assumptions on gene expression. However, the empirical validation of these assumptions is lacking. Results Best fit hypotheses tests, moment-ratio diagrams and relationships between the different moments of the distribution of the gene expression was used to characterize the observed distributions. The data are obtained from the publicly available gene expression database, Gene Expression Omnibus (GEO) to characterize the empirical distributions of gene expressions obtained under varying experimental situations each of which providing relatively large number of samples for hypothesis testing. All data were obtained from either of two microarray platforms - the commercial Affymetrix mouse 430.2 platform and a non-commercial Rosetta/Merck one. The data from each platform were preprocessed in the same manner. Conclusions The null hypotheses for goodness of fit for all considered univariate theoretical probability distributions (including the Normal distribution) are rejected for more than 50% of probe sets on the Affymetrix microarray platform at a 95% confidence level, suggesting that under the tested conditions a priori assumption of any of these distributions across all probe sets is not valid. The pattern of null hypotheses rejection was different for the data from Rosetta/Merck platform with only around 20% of the probe sets failing the logistic distribution goodness-of-fit test. We find that there are statistically significant (at 95% confidence level based on the F-test for the fitted linear model) relationships between the mean and the logarithm of the coefficient of variation of the distributions of the logarithm of gene expressions. An additional novel statistically significant quadratic relationship between the skewness and kurtosis is identified. Data from both microarray platforms fail to identify with any one of the chosen theoretical probability distributions from an analysis of the l-moment ratio diagram.

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

https://link.springer.com/content/pdf/10.1186/1471-2105-11-576.pdf

Reference70 articles.

1. Bray NJ, Buckland PR, Owen MJ, O'Donovan MC: Cis-acting variation in the expression of a high proportion of genes in human brain. Hum Genet 2003, 113: 149–153.

2. Buckland PR: Allele-specific gene expression differences in humans. Hum Mol Genet 2004, 13(Spec No 2):R255–260. 10.1093/hmg/ddh227

3. He H, Olesnanik K, Nagy R, Liyanarachchi S, Prasad ML, Stratakis CA, Kloos RT, de la Chapelle A: Allelic variation in gene expression in thyroid tissue. Thyroid 2005, 15: 660–667. 10.1089/thy.2005.15.660

4. McRae AF, Matigian NA, Vadlamudi L, Mulley JC, Mowry B, Martin NG, Berkovic SF, Hayward NK, Visscher PM: Replicated effects of sex and genotype on gene expression in human lymphoblastoid cell lines. Hum Mol Genet 2007, 16: 364–373. 10.1093/hmg/ddl456

5. Monks SA, Leonardson A, Zhu H, Cundiff P, Pietrusiak P, Edwards S, Phillips JW, Sachs A, Schadt EE: Genetic inheritance of gene expression in human cell lines. Am J Hum Genet 2004, 75: 1094–1105. 10.1086/426461

Cited by 15 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. SIEVE: One-stop differential expression, variability, and skewness analyses using RNA-Seq data;2024-04-13

2. Mixed Distribution Models Based on Single-Cell RNA Sequencing Data;Interdisciplinary Sciences: Computational Life Sciences;2021-03-22

3. Information Length as a Useful Index to Understand Variability in the Global Circulation;Mathematics;2020-02-24

4. Steroid enzyme and receptor expression and regulations in breast tumor samples – A statistical evaluation of public data;The Journal of Steroid Biochemistry and Molecular Biology;2020-02

5. Investigating skewness to understand gene expression heterogeneity in large patient cohorts;BMC Bioinformatics;2019-12