An improved codon modeling approach for accurate estimation of the mutation bias-Reference-Cited by-同舟云学术

An improved codon modeling approach for accurate estimation of the mutation bias

Published:2021-07-01 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Latrille T.^ORCID,Lartillot N.^ORCID

Abstract

AbstractNucleotide composition in protein-coding sequences is the result of the equilibrium between mutation and selection. In particular, the nucleotide composition differs between the three coding positions, with the third position showing more extreme composition than the first and the second positions. Yet, phylogenetic codon models do not correctly capture this phenomenon and instead predict that the nucleotide composition should be the same for all 3 positions of the codons. Alternatively, some models allow for different nucleotide rates at the three positions, a problematic approach since the mutation process should in principle be blind to the coding structure and homogeneous across coding positions. Practically, this misconception could have important consequences in modelling the impact of GC-biased gene conversion (gBGC) on the evolution of protein-coding sequences, a factor which requires mutation and fixation biases to be carefully disentangled. Conceptually, the problem comes from the fact that phylogenetic codon models cannot correctly capture the fixation bias acting against the mutational pressure at the mutation-selection equilibrium. To address this problem, we present an improved codon modeling approach where the fixation rate is not seen as a scalar anymore, but as a tensor unfolding along multiple directions, which gives an accurate representation of how mutation and selection oppose each other at equilibrium. Thanks to this, this modelling approach yields a reliable estimate of the mutational process, while disentangling fixation probabilities in different directions.

Publisher

Cold Spring Harbor Laboratory

Reference54 articles.

1. An Experimentally Informed Evolutionary Model Improves Phylogenetic Fit to Divergent Lactamase Homologs

2. Identification of positive selection in genes is greatly improved by using experimentally informed site-specific models;Biology Direct,2017

3. Performance of neural network basecalling tools for Oxford Nanopore sequencing

4. How to calculate the non-synonymous to synonymous rate ratio of protein-coding genes under the fisher-wright mutation-selection framework;Biology Letters,2015

5. Viruses are a dominant driver of protein adaptation in mammals