Abstract
Molecular evolution studies, such as phylogenomic studies and genome-wide surveys of selection, often rely on gene families of single-copy orthologs (SC-OGs). Large gene families with multiple homologs in 1 or more species—a phenomenon observed among several important families of genes such as transporters and transcription factors—are often ignored because identifying and retrieving SC-OGs nested within them is challenging. To address this issue and increase the number of markers used in molecular evolution studies, we developed OrthoSNAP, a software that uses a phylogenetic framework to simultaneously split gene families into SC-OGs and prune species-specific inparalogs. We term SC-OGs identified by OrthoSNAP as SNAP-OGs because they are identified using a splitting and pruning procedure analogous to snapping branches on a tree. From 415,129 orthologous groups of genes inferred across 7 eukaryotic phylogenomic datasets, we identified 9,821 SC-OGs; using OrthoSNAP on the remaining 405,308 orthologous groups of genes, we identified an additional 10,704 SNAP-OGs. Comparison of SNAP-OGs and SC-OGs revealed that their phylogenetic information content was similar, even in complex datasets that contain a whole-genome duplication, complex patterns of duplication and loss, transcriptome data where each gene typically has multiple transcripts, and contentious branches in the tree of life. OrthoSNAP is useful for increasing the number of markers used in molecular evolution data matrices, a critical step for robustly inferring and exploring the tree of life.
Funder
Howard Hughes Medical Institute
National Science Foundation
National Institute of Allergy and Infectious Diseases
Division of Microbiology and Infectious Diseases, National Institute of Allergy and Infectious Diseases
Burroughs Wellcome Fund
Publisher
Public Library of Science (PLoS)
Subject
General Agricultural and Biological Sciences,General Immunology and Microbiology,General Biochemistry, Genetics and Molecular Biology,General Neuroscience
Reference79 articles.
1. Genome-scale approaches to resolving incongruence in molecular phylogenies;A Rokas;Nature,2003
2. A Beginners Guide to Estimating the Non-synonymous to Synonymous Rate Ratio of all Protein-Coding Genes in a Genome;DC Jeffares,2015
3. A gene coevolution network provides insight into eukaryotic cellular and genomic structure and function;JL Steenwyk;bioRxiv,2021
4. Single-Copy Genes as Molecular Markers for Phylogenomic Studies in Seed Plants;Z Li;Genome Biol Evol,2017
5. Natural selection and repeated patterns of molecular evolution following allopatric divergence;Y Dong;Elife,2019
Cited by
16 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献