A fast comparative genome browser for diverse bacteria and archaea-Reference-Cited by-同舟云学术

A fast comparative genome browser for diverse bacteria and archaea

Published:2023-08-24 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Price Morgan N.^ORCID,Arkin Adam P.^ORCID

Abstract

AbstractGenome sequencing has revealed an incredible diversity of bacteria and archaea, but there are no fast and convenient tools for browsing across these genomes. It is cumbersome to view the prevalence of homologs for a protein of interest, or the gene neighborhoods of those homologs, across the diversity of the prokaryotes. We developed a web-based tool,fast.genomics, that uses two strategies to support fast browsing across the diversity of prokaryotes. First, the database of genomes is split up. The main database contains one representative from each of the 6,377 genera that have a high-quality genome, and additional databases for each taxonomic order contain up to 10 representatives of each species. Second, homologs of proteins of interest are identified quickly by using accelerated searches, usually in a few seconds. Once homologs are identified,fast.genomicscan quickly show their prevalence across taxa, view their neighboring genes, or compare the prevalence of two different proteins.Fast.genomicsis available athttps://fast.genomics.lbl.gov.ImportanceNow that we have genome sequences for tens of thousands of species of bacteria and archaea, we would like to predict the functions of their proteins. One common strategy is comparative genomics: by considering which genomes contain similar proteins, and which proteins are often encoded near each other, we can often guess the proteins’ functions. But there was no good way to do these analyses quickly. We built a website that performs them in a few seconds. We used two strategies to speed up the key step, which is finding similar proteins. First, we split up the database of genomes into a main database with one representative for each genus, and sub-databases for each taxonomic order. Either way, searches against fewer genomes are much faster. Second, we use accelerated searches to find similar proteins, with only a slight loss of sensitivity.

Publisher

Cold Spring Harbor Laboratory

Reference45 articles.

1. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs

2. ProPhylo: partial phylogenetic profiling to guide protein family construction and assignment of biological process

3. GeCoViz: genomic context visualisation of prokaryotic genes from a functional and evolutionary perspective

4. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea

5. eggNOG-mapper v2: Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic Scale

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. High-throughput genetics enables identification of nutrient utilization and accessory energy metabolism genes in a model methanogen;2024-03-05

2. Beyond blast: enabling microbiologists to better extract literature, taxonomic distributions and gene neighbourhood information for protein families;Microbial Genomics;2024-02-07

3. AnnoView enables large-scale analysis, comparison, and visualization of microbial gene neighborhoods;2024-01-16

4. Beyond Blast: Enabling Microbiologists to Better Extract Literature, Taxonomic Distributions and Gene Neighborhood Information for Protein Families;2023-05-03