Ensemble modeling with machine learning and deep learning to provide interpretable generalized rules for classifying CNS drugs with high prediction power-Reference-Cited by-同舟云学术

Ensemble modeling with machine learning and deep learning to provide interpretable generalized rules for classifying CNS drugs with high prediction power

Published:2021-09-17 Issue:1 Volume:23 Page:
ISSN:1467-5463
Container-title:Briefings in Bioinformatics
language:en
Short-container-title:

Author:

Yu Tzu-Hui¹,Su Bo-Han²,Battalora Leo Chander³,Liu Sin⁴,Tseng Yufeng Jane⁵^ORCID

Affiliation:

1. National Taiwan University in Bio-Industry Communication and Development, No.1 Sec.4, Roosevelt Road, Taipei, Taiwan 106

2. Department of Computer Science and Information Engineering of National Taiwan University, No.1 Sec.4, Roosevelt Road, Taipei, Taiwan 106

3. Temple University in Philadelphia, USA

4. Graduate Institute of Biomedical Electronics and Bioinformatics of National Taiwan University, No.1 Sec.4, Roosevelt Road, Taipei, Taiwan 106

5. Graduate Institute of Biomedical Electronics and Bioinformatics, Department of Computer Science and Information Engineering and School of Pharmacy at National Taiwan University, No.1 Sec.4, Roosevelt Road, Taipei, Taiwan 106

Abstract

Abstract The trade-off between a machine learning (ML) and deep learning (DL) model’s predictability and its interpretability has been a rising concern in central nervous system-related quantitative structure–activity relationship (CNS-QSAR) analysis. Many state-of-the-art predictive modeling failed to provide structural insights due to their black box-like nature. Lack of interpretability and further to provide easy simple rules would be challenging for CNS-QSAR models. To address these issues, we develop a protocol to combine the power of ML and DL to generate a set of simple rules that are easy to interpret with high prediction power. A data set of 940 market drugs (315 CNS-active, 625 CNS-inactive) with support vector machine and graph convolutional network algorithms were used. Individual ML/DL modeling methods were also constructed for comparison. The performance of these models was evaluated using an additional external dataset of 117 market drugs (42 CNS-active, 75 CNS-inactive). Fingerprint-split validation was adopted to ensure model stringency and generalizability. The resulting novel hybrid ensemble model outperformed other constituent traditional QSAR models with an accuracy of 0.96 and an F1 score of 0.95. With the power of the interpretability provided with this protocol, our model laid down a set of simple physicochemical rules to determine whether a compound can be a CNS drug using six sub-structural features. These rules displayed higher classification ability than classical guidelines, with higher specificity and more mechanistic insights than just for blood–brain barrier permeability. This hybrid protocol can potentially be used for other drug property predictions.

Funder

Ministry of Science and Technology

National Taiwan University

Ministry of Health and Welfare

Publisher

Oxford University Press (OUP)

Subject

Molecular Biology,Information Systems

Link

https://academic.oup.com/bib/article-pdf/23/1/bbab377/42229852/bbab377.pdf

Reference63 articles.

1. CNS drug design: balancing physicochemical properties for optimal brain exposure;Rankovic;J Med Chem,2015