Semi-Supervised Learning to Boost Cardiotoxicity Prediction by Mining a Large Unlabeled Small Molecule Dataset

Author:

Arab IssarORCID,Laukens KrisORCID,Bittremieux WoutORCID

Abstract

AbstractPredicting drug toxicity is a critical aspect of ensuring patient safety during the drug design process. Although conventional machine learning techniques have shown some success in this field, the scarcity of annotated toxicity data poses a significant challenge in enhancing models’ performance. In this study, we explore the potential of leveraging large unlabeled datasets using semi-supervised learning to improve predictive performance for cardiotoxicity across three targets: the voltage-gated potassium channel (hERG), the voltage-gated calcium channel (Cav1.2), and the voltage-gated sodium channel (Nav1.5). We extensively mined the ChEMBL database, comprising approximately 2 million small molecules, then employed semi-supervised learning to construct robust classification models for this purpose. We achieved a performance boost on highly diverse (i.e. structurally dissimilar) test datasets across all three targets. Using our built models, we screened the whole ChEMBL database and a large set of FDA-approved drugs, identifying several compounds with potential cardiac channel activity. To ensure broad accessibility and usability for both technical and non-technical users, we developed a cross-platform graphical user interface that allows users to make predictions and gain insights into the cardiotoxicity of drugs and other small molecules. The software is made available as open source under the permissive MIT license athttps://github.com/issararab/CToxPred2.

Publisher

Cold Spring Harbor Laboratory

Reference78 articles.

1. Integrating virtual screening in lead discovery

2. Dean, A. ; Lewis, S. (Eds.). Screening: methods for experimentation in industry, drug discovery, and genetics. Springer Science & Business Media. 2006

3. Innovation in the pharmaceutical industry: new estimates of R&D costs;Journal of health economics,2016

4. How to improve R&D productivity: the pharmaceutical industry's grand challenge

5. etoxpred: A machine learning-based approach to estimate the toxicity of drug candidates;BMC Pharmacology and Toxicology,2019

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3