Abstract
AbstractPlasmid, as a mobile genetic element, plays a pivotal role in facilitating the transfer of traits, such as antimicrobial resistance, among the bacterial community. Annotating plasmid-encoded proteins with the widely used Gene Ontology (GO) vocabulary is a fundamental step in various tasks, including plasmid mobility classification. However, GO prediction for plasmid-encoded proteins faces two major challenges: the high diversity of functions and the limited availability of high-quality GO annotations. Thus, we introduce PlasGO, a tool that leverages a hierarchical architecture to predict GO terms for plasmid proteins. PlasGO utilizes a powerful protein language model to learn the local context within protein sentences and a BERT model to capture the global context within plasmid sentences. Additionally, PlasGO allows users to control the precision by incorporating a self-attention confidence weighting mechanism. We rigorously evaluated PlasGO and benchmarked it against six state-of-the-art tools in a series of experiments. The experimental results collectively demonstrate that PlasGO has achieved commendable performance. PlasGO significantly expanded the annotations of the plasmid-encoded protein database by assigning high-confidence GO terms to over 95% of previously unannotated proteins, showcasing impressive precision of 0.8229, 0.7941, and 0.8870 for the three GO categories, respectively, as measured on the novel protein test set.
Publisher
Cold Spring Harbor Laboratory
Reference45 articles.
1. Mobility of Plasmids
2. Conjugative Plasmid Transfer in Gram-Positive Bacteria
3. Beyond horizontal gene transfer: the role of plasmids in bacterial evolution;Nature Reviews Microbiology,2021
4. A mathematician’s guide to plasmids: an introduction to plasmid biology for modellers;Microbiology,2023
5. Genomics of microbial plasmids: classification and identification based on replication and transfer systems and host taxonomy;Frontiers in microbiology,2015