Author:
Kang Guolian,Mirzaei Sedigheh S.,Zhang Hui,Zhu Liang,Rai Shesh N.,Srivastava Deo Kumar
Abstract
In the context of high-throughput data, the differences in continuous markers between two groups are usually assessed by ordering the p-values obtained from the two-sample pooled t-test or Wilcoxon–Mann–Whitney test and choosing a stringent cutoff such as 10–8 to control the family-wise error rate (FWER) or false discovery rate (FDR). All markers with p-values below the cutoff are declared to be significantly associated with the phenotype. This inherently assumes that the test procedure provides valid type I error estimates in extreme tails of the null distribution. The aforementioned tests assume homoscedasticity in the two groups, and the t-test further assumes underlying distributions to be normally distributed. Cao et al. (Biometrika, 2013, 100, 495–502) have shown that in the context of multiple hypotheses testing the approach based on FDR may not be valid under non-normality and/or heteroscedasticity. Therefore, having a test statistic that is robust to these violations is needed. In this study, we propose a robust analog of Behrens–Fisher statistic based on trimmed means, conduct an extensive simulation study to compare its performance with other competing approaches, and demonstrate its usefulness by applying it to DNA methylation data used by Teschendorff et al. (Genome Res., 2010, 20, 440–446). An R program to implement the proposed method is provided in the Supplementary Material.