Abstract
AbstractPlasmids play an essential role in horizontal gene transfer among diverse microorganisms, aiding their host bacteria in acquiring beneficial traits like antibiotic and metal resistance. Identifying the host bacteria where a plasmid can transfer, replicate or persist provides insights into how plasmids promote bacterial evolution. Plasmid host range prediction tools can be categorized as alignment-based and learning-based. Alignment-based tools have high precision but fail to align many newly sequenced plasmids with characterized ones in reference databases. In contrast, learning-based tools help predict the host range of these newly discovered plasmids. Although previous researches have demonstrated the existence of broad-host-range (BHR) plasmids, there is no database providing their detailed and complete host labels. Without adequate well-annotated training samples, learning-based tools fail to extract discriminative feature representations and obtain limited performance. To address this problem, we propose a self-correction multi-label learning model called MOSTPLAS. We design a pseudo label learning algorithm and a self-correction asymmetric loss to facilitate the training of multi-label learning model with samples containing some unknown missing positive labels. Experimental results on multi-host plasmids generated from the NCBI RefSeq database, metagenomic data, and real-world plasmid sequences with experimentally determined host range demonstrate the superiority of MOSTPLAS.
Publisher
Cold Spring Harbor Laboratory
Reference38 articles.
1. Genomics of microbial plasmids: classification and identification based on replication and transfer systems and host taxonomy;Frontiers in microbiology,2015
2. The evolution of plasmid-carried antibiotic resistance;BMC evolutionary biology,2011
3. Exploring antibiotic resistance genes and metal resistance genes in plasmid metagenomes from wastewater treatment plants;Frontiers in microbiology,2015
4. Host range diversification within the IncP-1 plasmid group
5. Deeplasmid: deep learning accurately separates plasmids from bacterial chromosomes;Nucleic acids research,2022