Machine learning for identifying resistance features of Klebsiella pneumoniae using whole-genome sequence single nucleotide polymorphisms

Author:

Liu Wenjia1ORCID,Ying Nanjiao12,Mo Qiusi1,Li Shanshan1ORCID,Shao Mengjie1ORCID,Sun Lingli34,Zhu Lei21ORCID

Affiliation:

1. College of Automation, Hangzhou Dianzi University, Hangzhou, Zhejiang, 310018, PR China

2. Institute of Biomedical Engineering, Hangzhou Dianzi University, Hangzhou, Zhejiang, 310018, PR China

3. NMPA Key Laboratory for Testing and Risk Warning of Pharmaceutical Microbiology, Hangzhou, Zhejiang, 310012, PR China

4. Key Laboratory of Microorganism Technology and Bioinformatics Research of Zhejiang Province, Hangzhou, Zhejiang, 310012, PR China

Abstract

Introduction. Klebsiella pneumoniae , a gram-negative bacterium, is a common pathogen causing nosocomial infection. The drug-resistance rate of K. pneumoniae is increasing year by year, posing a severe threat to public health worldwide. K. pneumoniae has been listed as one of the pathogens causing the global crisis of antimicrobial resistance in nosocomial infections. We need to explore the drug resistance of K. pneumoniae for clinical diagnosis. Single nucleotide polymorphisms (SNPs) are of high density and have rich genetic information in whole-genome sequencing (WGS), which can affect the structure or expression of proteins. SNPs can be used to explore mutation sites associated with bacterial resistance. Hypothesis/Gap Statement. Machine learning methods can detect genetic features associated with the drug resistance of K. pneumoniae from whole-genome SNP data. Aims. This work used Fast Feature Selection (FFS) and Codon Mutation Detection (CMD) machine learning methods to detect genetic features related to drug resistance of K. pneumoniae from whole-genome SNP data. Methods. WGS data on resistance of K. pneumoniae strains to four antibiotics (tetracycline, gentamicin, imipenem, amikacin) were downloaded from the European Nucleotide Archive (ENA). Sequence alignments were performed with MUMmer 3 to complete SNP calling using K. pneumoniae HS11286 chromosome as the reference genome. The FFS algorithm was applied to feature selection of the SNP dataset. The training set was constructed based on mutation sites with mutation frequency >0.995. Based on the original SNP training set, 70% of SNPs were randomly selected from each dataset as the test set to verify the accuracy of the training results. Finally, the resistance genes were obtained by the CMD algorithm and Venny. Results. The number of strains resistant to tetracycline, gentamicin, imipenem and amikacin was 931, 1048, 789 and 203, respectively. Machine learning algorithms were applied to the SNP training set and test set, and 28 and 23 resistance genes were predicted, respectively. The 28 resistance genes in the training set included 22 genes in the test set, which verified the accuracy of gene prediction. Among them, some genes (KPHS_35310, KPHS_18220, KPHS_35880, etc.) corresponded to known resistance genes (Eef2, lpxK, MdtC, etc). Logistic regression classifiers were established based on the identified SNPs in the training set. The area under the curves (AUCs) of the four antibiotics was 0.939, 0.950, 0.912 and 0.935, showing a strong ability to predict bacterial resistance. Conclusion. Machine learning methods can effectively be used to predict resistance genes and associated SNPs. The FFS and CMD algorithms have wide applicability. They can be used for the drug-resistance analysis of any microorganism with genomic variation and phenotypic data. This work lays a foundation for resistance research in clinical applications.

Publisher

Microbiology Society

Subject

Microbiology (medical),General Medicine,Microbiology

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3