PLM-ARG: antibiotic resistance gene identification using a pretrained protein language model

Author:

Wu Jun1ORCID,Ouyang Jian1,Qin Haipeng1,Zhou Jiajia1ORCID,Roberts Ruth23ORCID,Siam Rania4,Wang Lan5,Tong Weida6,Liu Zhichao7,Shi Tieliu18

Affiliation:

1. Center for Bioinformatics and Computational Biology, and The Institute of Biomedical Sciences, School of Life Sciences, East China Normal University , Shanghai 200241, China

2. ApconiX Ltd, Alderley Park , Alderley Edge SK10 4TG, United Kingdom

3. University of Birmingham , Birmingham B15 2TT, United Kingdom

4. Biology Department, School of Sciences and Engineering, The American University in Cairo , New Cairo 11835, Egypt

5. College of Architecture and Urban Planning, Tongji University , Shanghai 200092, China

6. National Center for Toxicological Research, Food and Drug Administration , Jefferson, AR 72079, United States

7. Nonclinical Drug Safety, Boehringer Ingelheim Pharmaceuticals, Inc , Ridgefield, CT 06877, United States

8. School of Statistics, Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, East China Normal University , Shanghai 200062, China

Abstract

Abstract Motivation Antibiotic resistance presents a formidable global challenge to public health and the environment. While considerable endeavors have been dedicated to identify antibiotic resistance genes (ARGs) for assessing the threat of antibiotic resistance, recent extensive investigations using metagenomic and metatranscriptomic approaches have unveiled a noteworthy concern. A significant fraction of proteins defies annotation through conventional sequence similarity-based methods, an issue that extends to ARGs, potentially leading to their under-recognition due to dissimilarities at the sequence level. Results Herein, we proposed an Artificial Intelligence-powered ARG identification framework using a pretrained large protein language model, enabling ARG identification and resistance category classification simultaneously. The proposed PLM-ARG was developed based on the most comprehensive ARG and related resistance category information (>28K ARGs and associated 29 resistance categories), yielding Matthew’s correlation coefficients (MCCs) of 0.983 ± 0.001 by using a 5-fold cross-validation strategy. Furthermore, the PLM-ARG model was verified using an independent validation set and achieved an MCC of 0.838, outperforming other publicly available ARG prediction tools with an improvement range of 51.8%–107.9%. Moreover, the utility of the proposed PLM-ARG model was demonstrated by annotating resistance in the UniProt database and evaluating the impact of ARGs on the Earth's environmental microbiota. Availability and implementation PLM-ARG is available for academic purposes at https://github.com/Junwu302/PLM-ARG, and a user-friendly webserver (http://www.unimd.org/PLM-ARG) is also provided.

Funder

Shanghai Municipal Science and Technology

Open Research Fund of Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE

Key Laboratory of MEA

Ministry of Education

East China Normal University

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Reference43 articles.

1. CARD 2020: antibiotic resistome surveillance with the comprehensive antibiotic resistance database;Alcock;Nucleic Acids Res,2020

2. A unified catalog of 204,938 reference genomes from the human gut microbiome;Almeida;Nat Biotechnol,2021

3. Basic local alignment search tool;Altschul;J Mol Biol,1990

4. DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data;Arango-Argoty;Microbiome,2018

5. Learning the protein language: evolution, structure, and function;Bepler;Cell Syst,2021

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3