Leveraging genomic redundancy to improve inference and alignment of orthologous proteins-Reference-Cited by-同舟云学术

Leveraging genomic redundancy to improve inference and alignment of orthologous proteins

Published:2023-01-25 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Singleton Marc^ORCID,Eisen Michael^ORCID

Abstract

AbstractIdentifying protein sequences with common ancestry is a core task in bioinformatics and evolutionary biology. However, methods for inferring and aligning such sequences in annotated genomes have not kept pace with the increasing scale and complexity of the available data. Thus, in this work we implemented several improvements to the traditional methodology that more fully leverage the redundancy of closely related genomes and the organization of their annotations. Two highlights include the application of the more flexiblek-clique percolation algorithm for identifying clusters of orthologous proteins and the development of a novel technique for removing poorly supported regions of alignments with a phylogenetic HMM. In making the latter, we also wrote a fully documented Python package Homomorph that implements standard HMM algorithms and created a set of tutorials to promote its use by a wide audience. We applied the resulting pipeline to a set of 33 annotatedDrosophilagenomes, generating 22,813 orthologous groups and 8,566 high-quality alignments.

Publisher

Cold Spring Harbor Laboratory

Reference65 articles.

1. OMA orthology in 2021: website overhaul, conserved isoforms, ancestral gene order and more;Nucleic Acids Research,2020

2. OMA standalone: orthology inference among public and custom genomes and transcriptomes

3. Weights for data related by a tree

4. Christiam Camacho et al. “BLAST+: architecture and applications”. In: BMC Bioinformatics 10.1 (Dec. 2009).

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Evolutionary analyses of IDRs reveal widespread signals of conservation;2023-12-07