Abstract
AbstractTwo-sample independent test methods are widely used in case-control studies to identify significant changes or differences, for example, to identify key pathogenic genes by comparing the gene expression levels in normal and disease cells. However, due to the high cost of data collection or labelling, many studies face the small sample problem, for which the traditional two-sample test methods often lose power. We propose a novel rank-based nonparametric test method WMW-A for small sample problem by introducing a three-sample statistic through another auxiliary sample. By combining the case, control and auxiliary samples together, we construct a three-sample WMW-A statistic based on the gap between the average ranks of the case and control samples in the combined samples. By assuming that the auxiliary sample follows a mixed distribution of the case and control populations, we analyze the theoretical properties of the WMW-A statistic and approximate the theoretical power. The extensive simulation experiments and real applications on microarray gene expression data sets show the WMW-A test could significantly improve the test power for two-sample problem with small sample sizes, by either available unlabelled auxiliary data or generated auxiliary data.
Publisher
Cold Spring Harbor Laboratory
Reference30 articles.
1. Z. Bai and H. Saranadasa . Effect of high dimension: by an example of a two sample problem. Statistica Sinica, pages 311–329, 1996.
2. Two-sample tests of high-dimensional means for compositional data;Biometrika,2017
3. A weighted edge-count two-sample test for multivariate and object data;Journal of the American Statistical Association,2018
4. A new graph-based two-sample test for multivariate and object data;Journal of the American statistical association,2017