Prioritizing hypothesis tests for high throughput data-Reference-Cited by-同舟云学术

Prioritizing hypothesis tests for high throughput data

Published:2015-11-16 Issue:6 Volume:32 Page:850-858
ISSN:1460-2059
Container-title:Bioinformatics
language:en
Short-container-title:

Author:

Kim Sangjin¹,Schliekelman Paul¹

Affiliation:

1. Department of Statistics, University of Georgia, Athens, GA 30602, USA

Abstract

Abstract Motivation: The advent of high throughput data has led to a massive increase in the number of hypothesis tests conducted in many types of biological studies and a concomitant increase in stringency of significance thresholds. Filtering methods, which use independent information to eliminate less promising tests and thus reduce multiple testing, have been widely and successfully applied. However, key questions remain about how to best apply them: When is filtering beneficial and when is it detrimental? How good does the independent information need to be in order for filtering to be effective? How should one choose the filter cutoff that separates tests that pass the filter from those that don’t? Result: We quantify the effect of the quality of the filter information, the filter cutoff and other factors on the effectiveness of the filter and show a number of results: If the filter has a high probability (e.g. 70%) of ranking true positive features highly (e.g. top 10%), then filtering can lead to dramatic increase (e.g. 10-fold) in discovery probability when there is high redundancy in information between hypothesis tests. Filtering is less effective when there is low redundancy between hypothesis tests and its benefit decreases rapidly as the quality of the filter information decreases. Furthermore, the outcome is highly dependent on the choice of filter cutoff. Choosing the cutoff without reference to the data will often lead to a large loss in discovery probability. However, naïve optimization of the cutoff using the data will lead to inflated type I error. We introduce a data-based method for choosing the cutoff that maintains control of the family-wise error rate via a correction factor to the significance threshold. Application of this approach offers as much as a several-fold advantage in discovery probability relative to no filtering, while maintaining type I error control. We also introduce a closely related method of P-value weighting that further improves performance. Availability and implementation: R code for calculating the correction factor is available at http://www.stat.uga.edu/people/faculty/paul-schliekelman. Contact: pdschlie@stat.uga.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Link

http://academic.oup.com/bioinformatics/article-pdf/32/6/850/32742892/btv608.pdf

Reference35 articles.

1. Multiple hypotheses testing with weights;Benjamini;Scand. J. Stat.,1997

2. Independent filtering increases detection power for high-throughput experiments;Bourgon;Proc. Natl Acad. Sci.,2010

3. Reply to Talloen et al.: independent filtering is a generic approach that needs domain specific adaptation;Bourgon;Proc. Natl Acad. Sci.,2010

4. Improving strategies for detecting genetic patterns of disease susceptibility in association studies;Calle;Stat. Med.,2008

5. Two-stage testing procedures with independent filtering for genome-wide gene-environment interaction;Dai;Biometrika,2012

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Faster SARS-CoV-2 sequence validation and annotation for GenBank using VADR;NAR Genomics and Bioinformatics;2023-01-10

2. Pairwise ratio-based differential abundance analysis of infant microbiome 16S sequencing data;NAR Genomics and Bioinformatics;2023-01-10

3. Impact of adaptive filtering on power and false discovery rate in RNA-seq experiments;BMC Bioinformatics;2022-09-24

4. Statistics for Bioinformatics;Bioinformatics in Rice Research;2021

5. A Structured Approach to Evaluating Life-Course Hypotheses: Moving Beyond Analyses of Exposed Versus Unexposed in the -Omics Context;American Journal of Epidemiology;2020-10-30