Improving candidate Biosynthetic Gene Clusters in fungi through reinforcement learning

Author:

Almeida Hayda123ORCID,Tsang Adrian12,Diallo Abdoulaye Baniré134

Affiliation:

1. Departement d’Informatique, UQAM , Montréal, QC H2X 3Y7, Canada

2. Centre for Structural and Functional Genomics, Concordia University , Montréal, QC H4B 1R6, Canada

3. Laboratoire d’Algèbre, de Combinatoire, et d’Informatique Mathématique (LACIM), UQAM , Montréal, QC H2X 3Y, Canada

4. Centre of Excellence in Research on Orphan Diseases—Courtois Foundation (CERMO-FC) , UQAM, Montréal, QC H2X 3Y7, Canada

Abstract

Abstract Motivation Precise identification of Biosynthetic Gene Clusters (BGCs) is a challenging task. Performance of BGC discovery tools is limited by their capacity to accurately predict components belonging to candidate BGCs, often overestimating cluster boundaries. To support optimizing the composition and boundaries of candidate BGCs, we propose reinforcement learning approach relying on protein domains and functional annotations from expert curated BGCs. Results The proposed reinforcement learning method aims to improve candidate BGCs obtained with state-of-the-art tools. It was evaluated on candidate BGCs obtained for two fungal genomes, Aspergillus niger and Aspergillus nidulans. The results highlight an improvement of the gene precision by above 15% for TOUCAN, fungiSMASH and DeepBGC; and cluster precision by above 25% for fungiSMASH and DeepBCG, allowing these tools to obtain almost perfect precision in cluster prediction. This can pave the way of optimizing current prediction of candidate BGCs in fungi, while minimizing the curation effort required by domain experts. Availability and implementation https://github.com/bioinfoUQAM/RL-bgc-components. Supplementary information Supplementary data are available at Bioinformatics online.

Funder

Natural Sciences and Engineering Research Council (NSERC) and the Fonds de recherche du Québec—Nature et technologies

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Reference31 articles.

1. The gold-standard genome of Aspergillus niger NRRL 3 enables a detailed view of the diversity of sugar catabolism in fungi;Aguilar-Pontes;Stud. Mycol,2018

2. TOUCAN: a framework for fungal biosynthetic gene cluster discovery;Almeida;NAR Genom. Bioinform,2020

3. antiSMASH 6.0: improving cluster detection and comparison capabilities;Blin;Nucleic Acids Res,2021

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3