Affiliation:
1. Jiangsu Province Hi‐Tech Key Laboratory for Biomedical Research, School of Chemistry and Chemical Engineering Southeast University Nanjing China
Abstract
AbstractWith the large‐scale development of drugs, understanding the drug phase behaviors in complex systems becomes more and more important. Among them, the solubility of drugs in biorelevant media needs to be urgently understood. To address this challenge, new strategies based on machine learning models are proposed. First, the strategy trains five machine learning models (extra trees [ET], gradient boosting [GB], k‐nearest neighbors [k‐NN], random forest [RF], and extreme gradient boosting [XGBoost]) based on 15 molecular descriptors of the drug molecular properties. The XGboost model was identified as the best predictive model for predicting drug solubility performance in various solvents. Next, the input feature vectors were expanded for machine learning using the MACCS chemical fingerprint coupled with the XGboost model. The MACCS chemical fingerprint coupled with XGboost model has significantly improved the prediction accuracy of drug solubility. This finding demonstrates that the proposed strategy has solubility prediction capability, which is expected to provide valid information for drug development and drug solvent screening.
Funder
National Natural Science Foundation of China