The problem of selecting relevant descriptors in predicting the toxicity of chemicals

Author:

Guseva Ekaterina A.1ORCID

Affiliation:

1. Federal State Autonomous Educational Institution of Higher Education I.M. Sechenov First Moscow State Medical University of the Ministry of Health of the Russian Federation (Sechenov University)

Abstract

Introduction. Mathematical models are widely applicable in conducting toxicological studies and can be used to fill gaps that arise in the assessment of chemical safety. Most of the attention is paid to the study of algorithms for constructing models, rather than approaches to choosing the most informative features. The purpose of this study is to highlight aspects of the problem of choosing useful variables during mathematical modeling. Material and methods. SMILES and molecular descriptors for organothiophosphates were generated in the interactive Google Colaboratory environment based on the program code using the RDKit, Mordred software. Using the tools of the scikit-learn Ver. 1.2.2 library, features were selected by filtering and by recursive feature exclusion. The values of acute oral toxicity parameters were taken from official information sources about chemicals. The obtained models are subjected to an internal validation procedure to evaluate the performance of the models. Results. It should be noted that models where recursive exclusion of features was used have better characteristics than models based on descriptors selected by the filtering method. In particular, the acute toxicity prediction model for organothiophosphates based on the decision tree method with recursive exclusion of features has a high coefficient of determination (R2=0,91713), a relatively small root-mean-square error (RMSE= 0,35099), as well as high values of the cross-validation coefficient of determination (Q2LOO= 0,79756). Limitations. The results obtained can be used only in predicting the toxicity of the specified group of chemicals with a similar mechanism of action. Conclusion. The use of mathematical modeling is a promising tool for assessing the toxicity of chemicals, which has a number of features: on the one hand, it is a quick and convenient resource for screening the toxicity of substances, on the other hand, the model needs to be trained based not only on reliable research data, but also to carry out a qualitative selection procedure for signs that make a significant contribution to the functioning of the prognostic model.

Publisher

Federal Scientific Center for Hygiene F.F.Erisman

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3