Regularized sequence-context mutational trees capture variation in mutation rates across the human genome-Reference-Cited by-同舟云学术

Regularized sequence-context mutational trees capture variation in mutation rates across the human genome

Published:2023-07-07 Issue:7 Volume:19 Page:e1010807
ISSN:1553-7404
Container-title:PLOS Genetics
language:en
Short-container-title:PLoS Genet

Author:

Adams Christopher J.^ORCID,Conery Mitchell^ORCID,Auerbach Benjamin J.^ORCID,Jensen Shane T.,Mathieson Iain,Voight Benjamin F.^ORCID

Abstract

Germline mutation is the mechanism by which genetic variation in a population is created. Inferences derived from mutation rate models are fundamental to many population genetics methods. Previous models have demonstrated that nucleotides flanking polymorphic sites–the local sequence context–explain variation in the probability that a site is polymorphic. However, limitations to these models exist as the size of the local sequence context window expands. These include a lack of robustness to data sparsity at typical sample sizes, lack of regularization to generate parsimonious models and lack of quantified uncertainty in estimated rates to facilitate comparison between models. To address these limitations, we developed Baymer, a regularized Bayesian hierarchical tree model that captures the heterogeneous effect of sequence contexts on polymorphism probabilities. Baymer implements an adaptive Metropolis-within-Gibbs Markov Chain Monte Carlo sampling scheme to estimate the posterior distributions of sequence-context based probabilities that a site is polymorphic. We show that Baymer accurately infers polymorphism probabilities and well-calibrated posterior distributions, robustly handles data sparsity, appropriately regularizes to return parsimonious models, and scales computationally at least up to 9-mer context windows. We demonstrate application of Baymer in three ways–first, identifying differences in polymorphism probabilities between continental populations in the 1000 Genomes Phase 3 dataset, second, in a sparse data setting to examine the use of polymorphism models as a proxy for de novo mutation probabilities as a function of variant age, sequence context window size, and demographic history, and third, comparing model concordance between different great ape species. We find a shared context-dependent mutation rate architecture underlying our models, enabling a transfer-learning inspired strategy for modeling germline mutations. In summary, Baymer is an accurate polymorphism probability estimation algorithm that automatically adapts to data sparsity at different sequence context levels, thereby making efficient use of the available data.

Funder

National Institute of Diabetes and Digestive and Kidney Diseases

Publisher

Public Library of Science (PLoS)

Subject

Cancer Research,Genetics (clinical),Genetics,Molecular Biology,Ecology, Evolution, Behavior and Systematics

Reference46 articles.

1. Estimating population divergence time and phylogeny from single-nucleotide polymorphisms data with outgroup ascertainment bias;Y Wang;Mol Ecol,2012

2. Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data.;RN Gutenkunst;PLoS Genet.,2009

3. Widespread genomic signatures of natural selection in hominid evolution;G McVicker;PLoS Genet,2009

4. Analysis of protein-coding genetic variation in 60,706 humans;M Lek;Nature,2016

5. A map of constrained coding regions in the human genome;JM Havrilla;Nat Genet,2019

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Tumor-associated macrophage-derived exosomal miR21-5p promotes tumor angiogenesis by regulating YAP1/HIF-1α axis in head and neck squamous cell carcinoma;Cellular and Molecular Life Sciences;2024-04-11

2. Epigenomic insights into common human disease pathology;Cellular and Molecular Life Sciences;2024-04-11

3. Accurate inference of population history in the presence of background selection;2024-01-20

4. Evolution of the Mutation Spectrum Across a Mammalian Phylogeny;Molecular Biology and Evolution;2023-09-28