Author:
Liu Zilu,Turkmen Asuman,Lin Shili
Abstract
In genetic association studies with common diseases, population stratification is a major source of confounding. Principle component regression (PCR) and linear mixed model (LMM) are two commonly used approaches to account for population stratification. Previous studies have shown that LMM can be interpreted as including all principle components (PCs) as random-effect covariates. However, including all PCs in LMM may inflate type I error in some scenarios due to redundancy, while including only a few pre-selected PCs in PCR may fail to fully capture the genetic diversity. Here, we propose a statistical method under the Bayesian framework, Bayestrat, that utilizes appropriate shrinkage priors to shrink the effects of non- or minimally confounded PCs and improve the identification of highly confounded ones. Simulation results show that Bayestrat consistently achieves lower type I error rates yet higher power, especially when the number of PCs included in the model is large. We also apply our method to two real datasets, the Dallas Heart Studies (DHS) and the Multi-Ethnic Study of Atherosclerosis (MESA), and demonstrate the superiority of Bayestrat over commonly used methods.
Publisher
Cold Spring Harbor Laboratory
Reference65 articles.
1. Population stratification in genetic association studies;Current protocols in human genetics,2017
2. Population stratification and spurious allelic association;The Lancet,2003
3. Assessing the impact of population stratification on genetic association studies
4. The influence of popu-lation stratification on genetic markers associated with type 1 diabetes;Scientific reports,2017
5. Is population structure in the genetic biobank era irrelevant, a challenge, or an opportunity?;Human Genetics,2020