Abstract
Aqueous solubility is an important property for conducting chemical reactions of the compound. In this research, we develop several machine learning models for predicting the aqueous solubility reaction of molecules. The open public dataset, AqSolDB, was used for model development which contains 9982 data on molecule solubility. Several machine learning regression models were trained on the dataset and their performance was evaluated using mean absolute error. In this research, we use machine learning model-based tree for model development. The result showed that the best model for solubility prediction is using Categoric Boosting Regressor achieving 0.854 mean absolute error. The importance of feature that affected solubility can also be calculated from the calculation. It is shown that variable MolLogP strongly correlated with solubility reaction. To further improve our model, we selected several features using a genetics algorithm and trained selected feature using several machine learning-based tree models. It showed that the lowest mean absolute error obtained from Categoric Boosting Regressor model achieving 0.771 which provides an improvement with previous calculation without feature selection.
Subject
General Physics and Astronomy
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献