Machine Learning for Annotating Sparsely Labeled Biocide and Metallotoxin Resistance Genes Using Natural Language Processing Techniques

Author:

Ananey-Obiri Daniel1,Rhinehardt Kristen1

Affiliation:

1. North Carolina Agricultural and Technical State University

Abstract

Abstract Background The importance of non-antibiotic drugs such as metallic and biocide antimicrobials in the progression of antibiotic resistance in bacteria cannot be oversimplified. Through co-selection, they have been implicated as agents for the promotion of antibiotic resistance in bacteria. Plethora of literature has explored antibiotic resistance, but the same cannot be said of non-antibiotic antimicrobials such as metals and biocide in spite of the important role they play in this phenomenon. It is also a common knowledge that most biological sequence data are either wrongly labeled or without labels. The manual annotation of these data by human are time consuming, expensive, and prone to errors. The recent upsurge in machine learning applications promises a viable solution. Traditional machine learning approaches rely on heavily labeled samples to build models for predictions. However, machine learning methods such as semi-supervised learning (SSL) models can overcome the shortfalls in data labeling when few labeled samples are available. Results Here, we developed different SSL methods to annotate and identify biocide and metallotoxin resistance genes. We represent protein sequences as vectors developed using Word2vec and Global vectors (GloVe) word vectors. We simulated real-world scenarios by varying the number of samples from 5 to 30% and measured their performance on the two datasets. Our findings show that SSL methods are viable solution in annotating sparsely labeled genomic sequence data. SSL with fewer available sequences outperformed some supervised learning models. Conclusion The findings from this study indicate that we can use machine learning models with fewer labeled samples (5%) to annotate biological sequences. Also, non-antibiotic resistance genes can be identified with machine learning models with high accuracies.

Publisher

Research Square Platform LLC

Reference17 articles.

1. Chapelle, O., Scholkopf, B., & Zien, A. (2009). Semi-supervised learning (chapelle, o. et al., eds.; 2006)[book reviews]. IEEE Transactions on Neural Networks, 20(3), 542.

2. Adaptive edge weighting for graph-based learning algorithms;Karasuyama M;Machine Learning,2017

3. Robust label propagation on multiple networks;Kato T;IEEE Transactions on Neural Networks,2009

4. Cross-resistance to antibiotics of Escherichia coli adapted to benzalkonium chloride or exposed to stress‐inducers;Langsrud S;Journal of applied microbiology,2004

5. Liu, Z., Dong, X., Guan, Y., & Yang, J. (2013). Reserved self-training: A semi-supervised sentiment classification method for chinese microblogs. Proceedings of the Sixth International Joint Conference on Natural Language Processing, 455–462.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3