Determining the best set of molecular descriptors for a Toxicity classification problem-Reference-Cited by-同舟云学术

Determining the best set of molecular descriptors for a Toxicity classification problem

Published:2021-09 Issue:5 Volume:55 Page:2769-2783
ISSN:0399-0559
Container-title:RAIRO - Operations Research
language:
Short-container-title:RAIRO-Oper. Res.

Author:

Toppur Badri^ORCID,Jaims K.J.

Abstract

The safety norms for drug design are very strict with at least three stages of trials. One test, early on in the trials, is about the cardiotoxicity of the molecules, that is, whether the compound blocks any heart channel. Chemical libraries contain millions of compounds. Accurate a priori and in silico classification of non-blocking molecules, can reduce the screening for an effective drug, by half. The compound has to be checked for other risk factors alongside its therapeutic effect; these tests can also be done using a computer. Actual screening in a research laboratory is very expensive and time consuming. To enable the computer modelling, the molecules are provided in Simplified Molecular Input Line Entry (SMILE) format. In this study, they have been decoded using the chem-informatics development kit written in the Java language. The kit is accessed in the R statistical software environment through the rJava package, that is further wrapped in the rcdk package. The strings representing the molecular structure, are parsed by the rcdk functions, to provide structure-activity descriptors, that are known, to be good predictors of biological activity. These descriptors along with the known blocking behaviour of the molecule, constitute the input to the Decision Tree, Random Forest, Gradient Boosting, Support-Vector-Machine, Logistic Regression, and Artificial Neural Network algorithms. This paper reports the results of the data analysis project with shareware tools, to determine the best subset of molecular descriptors, from the large set that is available.

Publisher

EDP Sciences

Subject

Management Science and Operations Research,Computer Science Applications,Theoretical Computer Science

Link

https://www.rairo-ro.org/10.1051/ro/2021134/pdf

Reference27 articles.

1. “NanoBRIDGES” software: Open access tools to perform QSAR and nano-QSAR modeling

2. Anderson E., Veith G.D. and Weininger D., SMILES: a line notation and computerized interpreter for chemical structures. Report No. EPA/600/M-87/021. U.S. Environmental Protection Agency, Environmental Research Laboratory-Duluth, Duluth, MN 55804 (1987).

3. Natural allosteric modulators and their biological targets: molecular signatures and mechanisms

4. Rcpi: R/Bioconductor package to generate various descriptors of proteins, compounds and their interactions

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A machine learning-based QSAR model reveals important molecular features for understanding the potential inhibition mechanism of ionic liquids to acetylcholinesterase;Science of The Total Environment;2024-03

2. Analysis of important features to identify potential compound as Antibiotic Growth Promoter (AGP) using C5.0;IOP Conference Series: Earth and Environmental Science;2023-12-01

3. Explainable AI in drug discovery: self-interpretable graph neural network for molecular property prediction using concept whitening;Machine Learning;2023-10-31

4. QSAR analysis for pyrimidine and pyridine derivatives as RIPK2 (receptor interacting protein kinase 2) inhibitors;Journal of the Indian Chemical Society;2023-08

5. 2D‐Quantitative structure–activity relationship modeling for risk assessment of pharmacotherapy applied during pregnancy;Journal of Applied Toxicology;2023-05-02