Abstract
AbstractGenome sequencing has revealed an incredible diversity of bacteria and archaea, but there are no fast and convenient tools for browsing across these genomes. It is cumbersome to view the prevalence of homologs for a protein of interest, or the gene neighborhoods of those homologs, across the diversity of the prokaryotes. We developed a web-based tool,fast.genomics, that uses two strategies to support fast browsing across the diversity of prokaryotes. First, the database of genomes is split up. The main database contains one representative from each of the 6,377 genera that have a high-quality genome, and additional databases for each taxonomic order contain up to 10 representatives of each species. Second, homologs of proteins of interest are identified quickly by using accelerated searches, usually in a few seconds. Once homologs are identified,fast.genomicscan quickly show their prevalence across taxa, view their neighboring genes, or compare the prevalence of two different proteins.Fast.genomicsis available athttps://fast.genomics.lbl.gov.ImportanceNow that we have genome sequences for tens of thousands of species of bacteria and archaea, we would like to predict the functions of their proteins. One common strategy is comparative genomics: by considering which genomes contain similar proteins, and which proteins are often encoded near each other, we can often guess the proteins’ functions. But there was no good way to do these analyses quickly. We built a website that performs them in a few seconds. We used two strategies to speed up the key step, which is finding similar proteins. First, we split up the database of genomes into a main database with one representative for each genus, and sub-databases for each taxonomic order. Either way, searches against fewer genomes are much faster. Second, we use accelerated searches to find similar proteins, with only a slight loss of sensitivity.
Publisher
Cold Spring Harbor Laboratory