An efficient deep learning method for amino acid substitution model selection-Reference-Cited by-同舟云学术

An efficient deep learning method for amino acid substitution model selection

Published:2024-07-02 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Tinh Nguyen Huy,Vinh Le Sy

Abstract

AbstractAmino acid substitution models play an important role in studying the evolutionary relationships among species from protein sequences. The amino acid substitution model consists of a large number of parameters; therefore, it is estimated from hundreds or thousands of alignments. Both general models and clade–specific models have been estimated and widely used in phylogenetic analyses. The maximum likelihood method is normally used to select the best fit model for a specific protein alignment under the study. A number of studies have discussed theoretical concerns as well as computational burden of the maximum likelihood methods in model selection. Recently, machine learning methods have been proposed for selecting nucleotide models. In this paper, we propose methods to create summary statistics from protein alignments to efficiently train a network of so-called ModelDetector based on the convolutional neural network ResNet-18 for detecting amino acid models. Experiments on simulation data showed that the accuracy of ModelDetector was comparable with that of the maximum likelihood method ModelFinder. The ModelDetector network was trained from 64,800 alignments on a computer with 8 cores (without GPU) in about 12 hours. It is orders of magnitudes faster than the maximum likelihood method in inferring amino acid substitution models and able to analyze genome alignments with million sites in minutes.

Publisher

Cold Spring Harbor Laboratory

Reference28 articles.

1. ModelTeller: Model Selection for Optimal Phylogenetic Reconstruction Using Machine Learning

2. A New Look at the Statistical Model Identification

3. ModelRevelator: Fast phylogenetic model estimation via deep learning

4. Comparing Partitioned Models to Mixture Models: Do Information Criteria Apply?

5. ModelTest-NG: A New and Scalable Tool for the Selection of DNA and Protein Evolutionary Models