Author:
Vsevolozhskaya Olga A.,Zaykin Dmitri V.
Abstract
AbstractTesting millions of SNPs in genetic association studies has become standard routine for disease gene discovery, followed by prioritization of the strongest signals based on the set of the smallest P-values. In light of recent re-evaluation of statistical practice, it has been suggested that P-values are unfit as summaries of statistical evidence. Despite this criticism, P-values are commonly used and are unlikely to be abandoned by practitioners. Moreover, P-values contain information that can be utilized to address the concerns about their flaws and misuse. We present a new method for utilizing evidence summarized by P-values for estimating odds ratio (OR) based on its approximate posterior distribution. In our method, only P-value, sample size, and standard deviation for log(OR) are needed as summaries of data, accompanied by a suitable prior distribution for log(OR) that can assume any shape. The parameter of interest, log(OR), is the only parameter with a specified prior distribution, hence our model is a mix of classical and Bayesian approaches. We show that our “Mix Bayes” (MB) method retains the main advantages of the Bayesian approach: it yields direct probability statements about hypotheses for OR and is resistant to biases caused by selection of top-scoring SNPs. MB enjoys greater flexibility than similarly inspired methods in the assumed distribution for the summary statistic and in the form of the prior for the parameter of interest. We illustrate our method by presenting interval estimates of effect size for reported genetic associations with lung cancer. Although we focus on OR, our method is not limited to this particular measure of effect size and can be used broadly for assessing reliability of findings in studies testing multiple predictors.
Publisher
Cold Spring Harbor Laboratory