Author:
Schweiger Regev,Weissbrod Omer,Rahmani Elior,Müller-Nurasyid Martina,Kunze Sonja,Gieger Christian,Waldenberger Melanie,Rosset Saharon,Halperin Eran
Abstract
AbstractTesting for the existence of variance components in linear mixed models is a fundamental task in many applicative fields. In statistical genetics, the score test has recently become instrumental in the task of testing an association between a set of genetic markers and a phenotype. With few markers, this amounts to set-based variance component tests, which attempt to increase power in association studies by aggregating weak individual effects. When the entire genome is considered, it allows testing for the heritability of a phenotype, defined as the proportion of phenotypic variance explained by genetics. In the popular score-based Sequence Kernel Association Test (SKAT) method, the assumed distribution of the score test statistic is uncalibrated in small samples, with a correction being computationally expensive. This may cause severe inflation or deflation of p-values, even when the null hypothesis is true. Here, we characterize the conditions under which this discrepancy holds, and show it may occur also in large real datasets, such as a dataset from the Wellcome Trust Case Control Consortium 2 (n=13,950) study, and in particular when the individuals in the sample are unrelated. In these cases the SKAT approximation tends to be highly over-conservative and therefore underpowered. To address this limitation, we suggest an efficient method to calculate exact p-values for the score test in the case of a single variance component and a continuous response vector, which can speed up the analysis by orders of magnitude. Our results enable fast and accurate application of the score test in heritability and in set-based association tests. Our method is available in http://github.com/cozygene/RL-SKAT.
Publisher
Cold Spring Harbor Laboratory