Author:
Yang Ziyi,Liao Benben,Hsieh Changyu,Han Chao,Fang Liang,Zhang Shengyu
Abstract
Natural products produced by microorganisms constitute an important source of essential pharmaceuticals, including antimicrobial and anti-tumor drugs. These bioactive molecules are microbial secondary metabolites synthesized by co-localized genes termed Biosynthetic Gene Clusters (BGCs). The rapid increase of microbial genomics resources, due to the availability of high-throughput sequencing technologies, has spurred the development of computational methods for microbial genome mining for BGC discovery. Current machine learning methods, however, have limited successes in uncovering novel BGCs due to an excessive number of false positives in their predictions. To this end, we propose Deep-BGCpred, a framework that effectively addresses the aforementioned issue by improving a deep learning model termed DeepBGC. The new model embeds multi-source protein family domains and employs a stacked Bidirectional Long Short-Term Memory model to boost accuracy for BGC identifications. In particular, it integrates two customized strategies, sliding window strategy and dual-model serial screening, to improve the model’s performance stability and reduce the number of false positive in BGC predictions. We compare the proposed model against other well-established methods on common benchmarks and achieve new state-of-the-art results with convincing evidences. We expect that researchers working on genome mining for natural products may be greatly benefited from our newly proposed method, Deep-BGCpred.
Publisher
Cold Spring Harbor Laboratory
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献