fastSTRUCTURE: Variational Inference of Population Structure in Large SNP Data Sets

Author:

Raj Anil1,Stephens Matthew2,Pritchard Jonathan K13

Affiliation:

1. Department of Genetics, Stanford University, Stanford, California 94305

2. Departments of Statistics and Human Genetics, University of Chicago, Chicago, Illinois 60637

3. Department of Biology, Howard Hughes Medical Institute, Stanford University, Stanford, California 94305

Abstract

Abstract Tools for estimating population structure from genetic data are now used in a wide variety of applications in population genetics. However, inferring population structure in large modern data sets imposes severe computational challenges. Here, we develop efficient algorithms for approximate inference of the model underlying the STRUCTURE program using a variational Bayesian framework. Variational methods pose the problem of computing relevant posterior distributions as an optimization problem, allowing us to build on recent advances in optimization theory to develop fast inference tools. In addition, we propose useful heuristic scores to identify the number of populations represented in a data set and a new hierarchical prior to detect weak population structure in the data. We test the variational algorithms on simulated data and illustrate using genotype data from the CEPH–Human Genome Diversity Panel. The variational algorithms are almost two orders of magnitude faster than STRUCTURE and achieve accuracies comparable to those of ADMIXTURE. Furthermore, our results show that the heuristic scores for choosing model complexity provide a reasonable range of values for the number of populations represented in the data, with minimal bias toward detecting structure when it is very weak. Our algorithm, fastSTRUCTURE, is freely available online at http://pritchardlab.stanford.edu/structure.html.

Publisher

Oxford University Press (OUP)

Subject

Genetics

Reference30 articles.

1. Fast model-based estimation of ancestry in unrelated individuals.;Alexander;Genome Res.,2009

2. Beal, M. J., 2003 Variational algorithms for approximate Bayesian inference. Ph.D. Thesis, Gatsby Computational Neuroscience Unit, University College London, London.

3. Latent dirichlet allocation.;Blei;J. Mach. Learn. Res.,2003

4. Scalable variational inference for Bayesian variable selection in regression, and its accuracy in genetic association studies.;Carbonetto;Bayesian Anal.,2012

5. The population structure and recent colonization history of Oregon threespine stickleback determined using restriction-site associated DNA-sequencing.;Catchen;Mol. Ecol.,2013

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3