Abstract
Abstract
Background
The logarithmic acid dissociation constant pKa reflects the ionization of a chemical, which affects lipophilicity, solubility, protein binding, and ability to pass through the plasma membrane. Thus, pKa affects chemical absorption, distribution, metabolism, excretion, and toxicity properties. Multiple proprietary software packages exist for the prediction of pKa, but to the best of our knowledge no free and open-source programs exist for this purpose. Using a freely available data set and three machine learning approaches, we developed open-source models for pKa prediction.
Methods
The experimental strongest acidic and strongest basic pKa values in water for 7912 chemicals were obtained from DataWarrior, a freely available software package. Chemical structures were curated and standardized for quantitative structure–activity relationship (QSAR) modeling using KNIME, and a subset comprising 79% of the initial set was used for modeling. To evaluate different approaches to modeling, several datasets were constructed based on different processing of chemical structures with acidic and/or basic pKas. Continuous molecular descriptors, binary fingerprints, and fragment counts were generated using PaDEL, and pKa prediction models were created using three machine learning methods, (1) support vector machines (SVM) combined with k-nearest neighbors (kNN), (2) extreme gradient boosting (XGB) and (3) deep neural networks (DNN).
Results
The three methods delivered comparable performances on the training and test sets with a root-mean-squared error (RMSE) around 1.5 and a coefficient of determination (R2) around 0.80. Two commercial pKa predictors from ACD/Labs and ChemAxon were used to benchmark the three best models developed in this work, and performance of our models compared favorably to the commercial products.
Conclusions
This work provides multiple QSAR models to predict the strongest acidic and strongest basic pKas of chemicals, built using publicly available data, and provided as free and open-source software on GitHub.
Publisher
Springer Science and Business Media LLC
Subject
Library and Information Sciences,Computer Graphics and Computer-Aided Design,Physical and Theoretical Chemistry,Computer Science Applications
Reference70 articles.
1. Wikipedia (2019) Acid dissociation constant.
https://en.wikipedia.org/w/index.php?title=Acid_dissociation_constant&oldid=897688731
. Accessed 21 May 2019
2. US EPA-OCSPP (2015) Guidance for reporting on the environmental fate and transport of the stressors of concern in problem formulations. In: US EPA.
https://www.epa.gov/pesticide-science-and-assessing-pesticide-risks/guidance-reporting-environmental-fate-and-transport
. Accessed 21 May 2019
3. Klöpffer W, Rippen G, Frische R (1982) Physicochemical properties as useful tools for predicting the environmental fate of organic chemicals. Ecotoxicol Environ Saf 6:294–301.
https://doi.org/10.1016/0147-6513(82)90019-7
4. Linde CD (1994) Physico-chemical properties and environmental fate of pesticides. In: Environmental hazards assessment program, state of California EPA.
http://agris.fao.org/agris-search/search.do?recordID=US201300074742
. Accessed 21 May 2019
5. National Research Council (2014) A framework to guide selection of chemical alternatives. The National Academies Press, Washington, D.C.
https://doi.org/10.17226/18872
Cited by
90 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献