Abstract
AbstractPredicting drug toxicity is a critical aspect of ensuring patient safety during the drug design process. Although conventional machine learning techniques have shown some success in this field, the scarcity of annotated toxicity data poses a significant challenge in enhancing models’ performance. In this study, we explore the potential of leveraging large unlabeled datasets using semi-supervised learning to improve predictive performance for cardiotoxicity across three targets: the voltage-gated potassium channel (hERG), the voltage-gated calcium channel (Cav1.2), and the voltage-gated sodium channel (Nav1.5). We extensively mined the ChEMBL database, comprising approximately 2 million small molecules, then employed semi-supervised learning to construct robust classification models for this purpose. We achieved a performance boost on highly diverse (i.e. structurally dissimilar) test datasets across all three targets. Using our built models, we screened the whole ChEMBL database and a large set of FDA-approved drugs, identifying several compounds with potential cardiac channel activity. To ensure broad accessibility and usability for both technical and non-technical users, we developed a cross-platform graphical user interface that allows users to make predictions and gain insights into the cardiotoxicity of drugs and other small molecules. The software is made available as open source under the permissive MIT license athttps://github.com/issararab/CToxPred2.
Publisher
Cold Spring Harbor Laboratory