Scalable empirical mixture models that account for across-site compositional heterogeneity-Reference-Cited by-同舟云学术

Scalable empirical mixture models that account for across-site compositional heterogeneity

Published:2019-10-07 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Schrempf Dominik^ORCID,Lartillot Nicolas,Szöllősi Gergely^ORCID

Abstract

AbstractBiochemical demands constrain the range of amino acids acceptable at specific sites resulting in across-site compositional heterogeneity of the amino acid replacement process. Phylogenetic models that disregard this heterogeneity are prone to systematic errors, which can lead to severe long branch attraction artifacts. State-of-the-art models accounting for across-site compositional heterogeneity include the CAT model, which is computationally expensive, and empirical distribution mixture models estimated via maximum likelihood (C10 to C60 models). Here, we present a new, scalable method EDCluster for finding empirical distribution mixture models involving a simple cluster analysis. The cluster analysis utilizes specific coordinate transformations which allow the detection of specialized amino acid distributions either from curated databases, or from the alignment at hand. We apply EDCluster to the HOGENOM and HSSP databases in order to provide universal distribution mixture (UDM) models comprising up to 4096 components. Detailed analyses of the UDM models demonstrate the removal of various long branch attraction artifacts and improved performance compared to the C10 to C60 models. Ready-to-use implementations of the UDM models are provided for three established software packages (IQ-TREE, Phylobayes, and RevBayes).

Publisher

Cold Spring Harbor Laboratory

Reference61 articles.

1. The Statistical Analysis of Compositional Data;J. Royal Stat. Soc. Ser. B (Methological),1982

2. An Empirical Assessment of Long-Branch Attraction Artefacts in Deep Eukaryotic Phylogenomics

3. Eukaryotes with no mitochondria

4. WebLogo: A Sequence Logo Generator: Figure 1

5. Tree pattern matching in phylogenetic trees: automatic search for orthologs or paralogs in homologous gene sequence databases

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Identifying the best approximating model in Bayesian phylogenetics: Bayes factors, cross-validation or wAIC?;2022-04-22

2. Evidence for sponges as sister to all other animals from partitioned phylogenomics with mixture models and recoding;Nature Communications;2021-03-19

3. Phylogenomic Insights into the Origin of Primary Plastids;2020-08-04