Large-scale comparison of machine learning algorithms for target prediction of natural products

Author:

Liang Lu1,Liu Ye1,Kang Bo2,Wang Ru1,Sun Meng-Yu1,Wu Qi2,Meng Xiang-Fei2,Lin Jian-Ping134ORCID

Affiliation:

1. State Key Laboratory of Medicinal Chemical Biology, College of Pharmacy and Tianjin Key Laboratory of Molecular Drug Research, Nankai University , Haihe Education Park, 38 Tongyan Road, Tianjin 300353 , China

2. National Supercomputer Center in Tianjin , 10 Xinhuanxi Road, Tianjin Binhai New Area, Tianjin 300457 , China

3. Biodesign Center, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences , 32 West 7th Avenue, Tianjin Airport Economic Area, Tianjin 300308 , China

4. Platform of Pharmaceutical Intelligence, Tianjin International Joint Academy of Biomedicine , Tianjin 300457 , China

Abstract

Abstract Natural products (NPs) and their derivatives are important resources for drug discovery. There are many in silico target prediction methods that have been reported, however, very few of them distinguish NPs from synthetic molecules. Considering the fact that NPs and synthetic molecules are very different in many characteristics, it is necessary to build specific target prediction models of NPs. Therefore, we collected the activity data of NPs and their derivatives from the public databases and constructed four datasets, including the NP dataset, the NPs and its first-class derivatives dataset, the NPs and all its derivatives and the ChEMBL26 compounds dataset. Conditions, including activity thresholds and input features, were explored to access the performance of eight machine learning methods of target prediction of NPs, including support vector machines (SVM), extreme gradient boosting, random forests, K-nearest neighbor, naive Bayes, feedforward neural networks (FNN), convolutional neural networks and recurrent neural networks. As a result, the NPs and all their derivatives datasets were selected to build the best NP-specific models. Furthermore, the consensus models, as well as the voting models, were additionally applied to improve the prediction performance. More evaluations were made on the external validation set and the results demonstrated that (1) the NP-specific model performed better on the target prediction of NPs than the traditional models training on the whole compounds of ChEMBL26. (2) The consensus model of FNN + SVM possessed the best overall performance, and the voting model can significantly improve recall and specificity.

Funder

National Key R&D Program of China

Publisher

Oxford University Press (OUP)

Subject

Molecular Biology,Information Systems

Reference67 articles.

1. Natural product discovery: past, present, and future;Katz;J Ind Microbiol Biotechnol,2016

2. Quinine, an old anti-malarial drug in a modern world: role in the treatment of malaria;Achan;Malar J,2011

3. Counting on Natural Products For Drug Design;Rodrigues

4. Natural products in drug discovery: advances and opportunities;Atanasov;Nat Rev Drug Discov,2021

5. Review on natural products databases: where to find data in 2020;Sorokina;J Chem,2020

Cited by 4 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3