Overtraining often results in topologically incorrect species trees with maximum likelihood methods-Reference-Cited by-同舟云学术

Overtraining often results in topologically incorrect species trees with maximum likelihood methods

Published:2017-05-22 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

de Vienne Damien M.,Supek Fran^ORCID,Gabaldon Toni

Abstract

AbstractBackgroundOvertraining occurs when an optimization process is applied for too many steps, leading to a model describing noise in addition to the signal present in the data. This effect may affect typical approaches for species tree reconstruction that use maximum likelihood optimization procedures on a small sample of concatenated genes. In this context, overtraining may result in trees better describing the specific evolutionary history of the sampled genes rather than the sought evolutionary relationships among the species.ResultsUsing a cross-validation-like approach on real and simulated datasets we showed that overtraining occurs in a significant fraction of cases, leading to species trees that are more distant from a gold-standard reference tree than a previously considered (and rejected) solution in the optimization process. However, we show that the shape of the likelihood curve is informative of the optimal stopping point. As expected, overtraining is aggravated in smaller gene samples and in datasets with increased levels of topological variation among gene trees, but occurs also in controlled, simulated scenarios where a common underlying topology is enforced.ConclusionsOvertraining is frequent in species tree reconstruction and leads to a final tree that is worse in describing the evolutionary relationships of the species under study than an earlier (and rejected) solution encountered during the likelihood optimization process. This result should help develop specific methods for species tree reconstruction in the future, and may improve our understanding of the complexity of tree likelihood landscapes.

Publisher

Cold Spring Harbor Laboratory

Reference31 articles.

1. The supermatrix approach to systematics

2. Phylogenomics and the reconstruction of the tree of life

3. Distinguishing Homologous from Analogous Proteins

4. Gene Trees in Species Trees

5. Philippe H , de Vienne DM , Ranwez V , Roure B , Baurain D , Delsuc F : Pitfalls in supermatrix phylogenomics. European Journal of Taxonomy in press.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Much Ado About Nothing: Accelerating Maximum Likelihood Phylogenetic Inference via Early Stopping to evade (Over-)optimization;2024-07-08