Author:
Xue Nian,Zhang Yuzhu,Liu Sensen
Abstract
AbstractDetermining the aqueous solubility of the chemical compound is of great importancein-silicodrug discovery. However, correctly and rapidly predicting the aqueous solubility remains a challenging task. This paper explores and evaluates the predictability of multiple machine learning models in the aqueous solubility of compounds. Specifically, we apply a series of machine learning algorithms, including Random Forest, XG-Boost, LightGBM, and CatBoost, on a well-established aqueous solubility dataset (i. e., the Huuskonen dataset) of over 1200 compounds. Experimental results show that even traditional machine learning algorithms can achieve satisfactory performance with high accuracy. In addition, our investigation goes beyond mere prediction accuracy, delving into the interpretability of models to identify key features and understand the molecular properties that influence the predicted outcomes. This study sheds light on the ability to use machine learning approaches to predict compound solubility, significantly shortening the time that researchers spend on new drug discovery.
Publisher
Cold Spring Harbor Laboratory