Bayesian LASSO for population stratification correction in rare haplotype association studies
Author:
Liu Zilu1ORCID, Turkmen Asuman Seda1, Lin Shili1
Affiliation:
1. Department of Statistics , The Ohio State University , Columbus , OH 43210 , USA
Abstract
Abstract
Population stratification (PS) is one major source of confounding in both single nucleotide polymorphism (SNP) and haplotype association studies. To address PS, principal component regression (PCR) and linear mixed model (LMM) are the current standards for SNP associations, which are also commonly borrowed for haplotype studies. However, the underfitting and overfitting problems introduced by PCR and LMM, respectively, have yet to be addressed. Furthermore, there have been only a few theoretical approaches proposed to address PS specifically for haplotypes. In this paper, we propose a new method under the Bayesian LASSO framework, QBLstrat, to account for PS in identifying rare and common haplotypes associated with a continuous trait of interest. QBLstrat utilizes a large number of principal components (PCs) with appropriate priors to sufficiently correct for PS, while shrinking the estimates of unassociated haplotypes and PCs. We compare the performance of QBLstrat with the Bayesian counterparts of PCR and LMM and a current method, haplo.stats. Extensive simulation studies and real data analyses show that QBLstrat is superior in controlling false positives while maintaining competitive power for identifying true positives under PS.
Funder
National Institutes of Health National Center for Advancing Translational Sciences National Heart, Lung, and Blood Institute
Publisher
Walter de Gruyter GmbH
Subject
Computational Mathematics,Genetics,Molecular Biology,Statistics and Probability
Reference49 articles.
1. Abegaz, F., Chaichoompu, K., Génin, E., Fardo, D.W., König, I.R., Mahachie John, J.M., and Van Steen, K. (2019). Principals about principal components in statistical genetics. Briefings Bioinf. 20: 2200–2216, https://doi.org/10.1093/bib/bby081. 2. Albertsen, H.M., Chettier, R., Farrington, P., and Ward, K. (2013). Genome-wide association study link novel loci to endometriosis. PloS one 8: e58257, https://doi.org/10.1371/journal.pone.0058257. 3. Balding, D.J. and Nichols, R.A. (1995). A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity. Genetica 96: 3–12, https://doi.org/10.1007/bf01441146. 4. Bild, D.E., Bluemke, D.A., Burke, G.L., Detrano, R., Diez Roux, A.V., Folsom, A.R., Greenland, P., JacobsJr, D.R., Kronmal, R., Liu, K., et al.. (2002). Multi-ethnic study of atherosclerosis: objectives and design. Am. J. Epidemiol. 156: 871–881, https://doi.org/10.1093/aje/kwf113. 5. Biswas, S. and Lin, S. (2012). Logistic Bayesian lasso for identifying association with rare haplotypes and application to age-related macular degeneration. Biometrics 68: 587–597, https://doi.org/10.1111/j.1541-0420.2011.01680.x.
|
|