Random-Effects Substitution Models for Phylogenetics via Scalable Gradient Approximations

Author:

Magee Andrew F1ORCID,Holbrook Andrew J1ORCID,Pekar Jonathan E23,Caviedes-Solis Itzue W4,Matsen IV Fredrick A5678ORCID,Baele Guy9ORCID,Wertheim Joel O10,Ji Xiang11,Lemey Philippe9,Suchard Marc A11213ORCID

Affiliation:

1. Jonathan and Karin Fielding School of Public Health, University of California - Los Angeles Department of Biostatistics, , Los Angeles, CA , USA

2. University of California - San Diego Bioinformatics and Systems Biology Graduate Program, , La Jolla, CA , USA

3. University of California - San Diega Department of Biomedical Informatics, , La Jolla, CA , USA

4. Swarthmore College Department of Biology, , Swarthmore, PA , USA

5. Howard Hughes Medical Institute, Seattle , Washington , USA

6. Fred Hutchinson Cancer Research Center Computational Biology Program, , Seattle, Washington , USA

7. University of Washington Department of Genome Sciences, , Seattle, Washington , USA

8. University of Washington Department of Statistics, , Seattle, Washington , USA

9. Immunology and Transplantation, Rega Institute Department of Microbiology, , KU Leuven, Leuven , Belgium

10. University of California - San Diego Department of Medicine, , La Jolla, CA , USA

11. Tulane University Department of Mathematics, , New Orleans, LA , USA

12. David Geffen School of Medicine at UCLA, University of California - Los Angeles Department of Biomathematics, , Los Angeles, CA , USA

13. David Geffen School of Medicine at UCLA, University of California - Los Angeles Department of Human Genetics, , Los Angeles, CA , USA

Abstract

Abstract Phylogenetic and discrete-trait evolutionary inference depend heavily on an appropriate characterization of the underlying character substitution process. In this paper, we present random-effects substitution models that extend common continuous-time Markov chain models into a richer class of processes capable of capturing a wider variety of substitution dynamics. As these random-effects substitution models often require many more parameters than their usual counterparts, inference can be both statistically and computationally challenging. Thus, we also propose an efficient approach to compute an approximation to the gradient of the data likelihood with respect to all unknown substitution model parameters. We demonstrate that this approximate gradient enables scaling of sampling-based inference, namely Bayesian inference via Hamiltonian Monte Carlo, under random-effects substitution models across large trees and state-spaces. Applied to a dataset of 583 SARS-CoV-2 sequences, an HKY model with random-effects shows strong signals of nonreversibility in the substitution process, and posterior predictive model checks clearly show that it is a more adequate model than a reversible model. When analyzing the pattern of phylogeographic spread of 1441 influenza A virus (H3N2) sequences between 14 regions, a random-effects phylogeographic substitution model infers that air travel volume adequately predicts almost all dispersal rates. A random-effects state-dependent substitution model reveals no evidence for an effect of arboreality on the swimming mode in the tree frog subfamily Hylinae. Simulations reveal that random-effects substitution models can accommodate both negligible and radical departures from the underlying base substitution model. We show that our gradient-based inference approach is over an order of magnitude more time efficient than conventional approaches.

Funder

Howard Hughes Medical Institute

European Research Council

KU Leuven

Publisher

Oxford University Press (OUP)

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3