Sparse latent factor regression models for genome-wide and epigenome-wide association studies
Author:
Jumentier Basile12, Caye Kevin1, Heude Barbara3, Lepeule Johanna2, François Olivier14ORCID
Affiliation:
1. Centre National de la Recherche Scientifique, Grenoble INP, TIMC-IMAG CNRS UMR 5525 , Université Grenoble-Alpes , Grenoble , 38000 , France 2. Centre National de la Recherche Scientifique, Institut National de la Santé et de la Recherche Médicale, Institute for Advanced Biosciences, INSERM U 1209, CNRS UMR 5309 , Université Grenoble-Alpes , Grenoble , 38000 , France 3. Institut National de la Santé et de la Recherche Médicale, Centre of Research in Epidemiology and Statistics, INSERM UMR 1153 , Université de Paris , F75004 Paris , France 4. Inria Grenoble, Equipe Statify , Laboratoire Jean Kuntzmann , Rhône-Alpes Inovallée 655 Avenue de l’Europe - CS 90051 , Montbonnot , 38334 , France
Abstract
Abstract
Association of phenotypes or exposures with genomic and epigenomic data faces important statistical challenges. One of these challenges is to account for variation due to unobserved confounding factors, such as individual ancestry or cell-type composition in tissues. This issue can be addressed with penalized latent factor regression models, where penalties are introduced to cope with high dimension in the data. If a relatively small proportion of genomic or epigenomic markers correlate with the variable of interest, sparsity penalties may help to capture the relevant associations, but the improvement over non-sparse approaches has not been fully evaluated yet. Here, we present least-squares algorithms that jointly estimate effect sizes and confounding factors in sparse latent factor regression models. In simulated data, sparse latent factor regression models generally achieved higher statistical performance than other sparse methods, including the least absolute shrinkage and selection operator and a Bayesian sparse linear mixed model. In generative model simulations, statistical performance was slightly lower (while being comparable) to non-sparse methods, but in simulations based on empirical data, sparse latent factor regression models were more robust to departure from the model than the non-sparse approaches. We applied sparse latent factor regression models to a genome-wide association study of a flowering trait for the plant Arabidopsis thaliana and to an epigenome-wide association study of smoking status in pregnant women. For both applications, sparse latent factor regression models facilitated the estimation of non-null effect sizes while overcoming multiple testing issues. The results were not only consistent with previous discoveries, but they also pinpointed new genes with functional annotations relevant to each application.
Funder
Agence Nationale de la Recherche
Publisher
Walter de Gruyter GmbH
Subject
Computational Mathematics,Genetics,Molecular Biology,Statistics and Probability
Reference49 articles.
1. Abraham, E., Rousseaux, S., Agier, L., Giorgis-Allemand, L., Tost, J., Galineau, J., Hulin, A., Siroux, V., Vaiman, D., Charles, M.-A., et al.. (2018). Pregnancy exposure to atmospheric pollution and meteorological conditions and placental DNA methylation. Environ. Int. 118: 334–347. https://doi.org/10.1016/j.envint.2018.05.007. 2. Atwell, S., Huang, Y.S., Vilhjalmsson, B.J., Willems, G., Horton, M., Li, Y., Meng, D., Platt, A., Tarone, A.M., Hu, T.T., et al.. (2010). Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465: 627–631. https://doi.org/10.1038/nature08800. 3. Balding, D.J. (2006). A tutorial on statistical methods for population association studies. Nat. Rev. Genet. 7: 781–791. https://doi.org/10.1038/nrg1916. 4. Battram, T., Yousefi, P., Crawford, G., Prince, C., Babei, M.S., Sharp, G., Hatcher, C., Vega-Salas, M.J., Khodabakhsh, S., Whitehurst, O., et al.. (2021). The EWAS catalog: a database of epigenome-wide association studies. Technical Report, OSF Preprints, Available at: https://osf.io/837wn/. 5. Bertsekas, D. (1995). Nonlinear programming. J. Oper. Res. Soc. 48: 334. https://doi.org/10.1057/palgrave.jors.2600425.
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|