Abstract
The AlphaFold Protein Structure Database (AFDB) is the largest repository of accurately predicted structures with taxonomic labels. Despite providing predictions for over 214 million UniProt entries, the AFDB does not cover viral sequences, severely limiting their study. To bridge this gap, we created the Big Fantastic Virus Database (BFVD), a repository of 351,242 protein structures predicted by applying ColabFold to the viral sequence representatives of the UniRef30 clusters. BFVD holds a unique repertoire of protein structures as over 63% of its entries show no or low structural similarity to existing repositories. We demonstrate how BFVD substantially enhances the fraction of annotated bacteriophage proteins compared to sequence-based annotation using Bakta. In that, BFVD is on par with the AFDB, while holding nearly three orders of magnitude fewer structures. BFVD is an important virus-specific expansion to protein structure repositories, offering new opportunities to advance viral research. BFVD is freely available athttps://bfvd.steineggerlab.workers.dev/
Publisher
Cold Spring Harbor Laboratory