Author:
Sharma Bhanushee,Chenthamarakshan Vijil,Dhurandhar Amit,Pereira Shiranee,Hendler James A.,Dordick Jonathan S.,Das Payel
Abstract
AbstractExplainable machine learning for molecular toxicity prediction is a promising approach for efficient drug development and chemical safety. A predictive ML model of toxicity can reduce experimental cost and time while mitigating ethical concerns by significantly reducing animal and clinical testing. Herein, we use a deep learning framework for simultaneously modeling in vitro, in vivo, and clinical toxicity data. Two different molecular input representations are used; Morgan fingerprints and pre-trained SMILES embeddings. A multi-task deep learning model accurately predicts toxicity for all endpoints, including clinical, as indicated by the area under the Receiver Operator Characteristic curve and balanced accuracy. In particular, pre-trained molecular SMILES embeddings as input to the multi-task model improved clinical toxicity predictions compared to existing models in MoleculeNet benchmark. Additionally, our multitask approach is comprehensive in the sense that it is comparable to state-of-the-art approaches for specific endpoints in in vitro, in vivo and clinical platforms. Through both the multi-task model and transfer learning, we were able to indicate the minimal need of in vivo data for clinical toxicity predictions. To provide confidence and explain the model’s predictions, we adapt a post-hoc contrastive explanation method that returns pertinent positive and negative features, which correspond well to known mutagenic and reactive toxicophores, such as unsubstituted bonded heteroatoms, aromatic amines, and Michael receptors. Furthermore, toxicophore recovery by pertinent feature analysis captures more of the in vitro (53%) and in vivo (56%), rather than of the clinical (8%), endpoints, and indeed uncovers a preference in known toxicophore data towards in vitro and in vivo experimental data. To our knowledge, this is the first contrastive explanation, using both present and absent substructures, for predictions of clinical and in vivo molecular toxicity.
Publisher
Springer Science and Business Media LLC
Reference75 articles.
1. Hwang, T. J. et al. Failure of investigational drugs in late-stage clinical development and publication of trial results. JAMA Intern. Med. 176, 1826–1833. https://doi.org/10.1001/jamainternmed.2016.6008 (2016).
2. Hay, M., Thomas, D. W., Craighead, J. L., Economides, C. & Rosenthal, J. Clinical development success rates for investigational drugs. Nat. Biotechnol. 32, 40–51. https://doi.org/10.1038/nbt.2786 (2014).
3. Chenthamarakshan, V. et al. CogMol: Target-specific and selective drug design for covid-19 using deep generative models. In Advances in Neural Information Processing Systems, vol. 33, (eds. Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. F. & Lin, H.) 4320–4332 (Curran Associates, Inc., 2020). https://proceedings.neurips.cc/paper/2020/file/2d16ad1968844a4300e9a490588ff9f8-Paper.pdf.
4. Zhavoronkov, A. et al. Deep learning enables rapid identification of potent ddr1 kinase inhibitors. Nat. Biotechnol. 37, 1038–1040. https://doi.org/10.1038/s41587-019-0224-x (2019).
5. Li, Y., Zhang, L. & Liu, Z. Multi-objective de novo drug design with conditional graph generative model. Journal of Cheminformatics 10, 33. https://doi.org/10.1186/s13321-018-0287-6 (2018).
Cited by
21 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献