SSG-LUGIA: Single Sequence based Genome Level Unsupervised Genomic Island Prediction Algorithm

Author:

Ibtehaz Nabil1,Ahmed Ishtiaque2,Ahmed Md Sabbir2,Rahman M Sohel2,Azad Rajeev K34,Bayzid Md Shamsuzzoha2

Affiliation:

1. Samsung R&D Institute, Bangladesh

2. Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka-1205, Bangladesh

3. Department of Biological Sciences and BioDiscovery Institute, University of North Texas, Denton, TX, USA

4. Department of Mathematics, University of North Texas, Denton, TX, USA

Abstract

Abstract Background Genomic Islands (GIs) are clusters of genes that are mobilized through horizontal gene transfer. GIs play a pivotal role in bacterial evolution as a mechanism of diversification and adaptation to different niches. Therefore, identification and characterization of GIs in bacterial genomes is important for understanding bacterial evolution. However, quantifying GIs is inherently difficult, and the existing methods suffer from low prediction accuracy and precision–recall trade-off. Moreover, several of them are supervised in nature, and thus, their applications to newly sequenced genomes are riddled with their dependency on the functional annotation of existing genomes. Results We present SSG-LUGIA, a completely automated and unsupervised approach for identifying GIs and horizontally transferred genes. SSG-LUGIA is a novel method based on unsupervised anomaly detection technique, accompanied by further refinement using cues from signal processing literature. SSG-LUGIA leverages the atypical compositional biases of the alien genes to localize GIs in prokaryotic genomes. SSG-LUGIA was assessed on a large benchmark dataset `IslandPick’ and on a set of 15 well-studied genomes in the literature and followed by a thorough analysis on the well-understood Salmonella typhi CT18 genome. Furthermore, the efficacy of SSG-LUGIA in identifying horizontally transferred genes was evaluated on two additional bacterial genomes, namely, those of Corynebacterium diphtheria NCTC13129 and Pseudomonas aeruginosa LESB58. SSG-LUGIA was examined on draft genomes and was demonstrated to be efficient as an ensemble method. Conclusions Our results indicate that SSG-LUGIA achieved superior performance in comparison to frequently used existing methods. Importantly, it yielded a better trade-off between precision and recall than the existing methods. Its nondependency on the functional annotation of genomes makes it suitable for analyzing newly sequenced, yet uncharacterized genomes. Thus, our study is a significant advance in identification of GIs and horizontally transferred genes. SSG-LUGIA is available as an open source software at https://nibtehaz.github.io/SSG-LUGIA/.

Publisher

Oxford University Press (OUP)

Subject

Molecular Biology,Information Systems

Cited by 3 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3