Affiliation:
1. College of Pharmaceutical Sciences Soochow University Suzhou China
2. College of Chemistry and Life Science Beijing University of Technology Beijing China
3. Jiangsu Province Engineering Research Center of Precision Diagnostics and Therapeutics Development Soochow University Suzhou China
Abstract
ABSTRACTThe process of developing new drugs is widely acknowledged as being time‐intensive and requiring substantial financial investment. Despite ongoing efforts to reduce time and expenses in drug development, ensuring medication safety remains an urgent problem. One of the major problems involved in drug development is hepatotoxicity, specifically known as drug‐induced liver injury (DILI). The popularity of new drugs often poses a significant barrier during development and frequently leads to their recall after launch. In silico methods have many advantages compared with traditional in vivo and in vitro assays. To establish a more precise and reliable prediction model, it is necessary to utilize an extensive and high‐quality database consisting of information on drug molecule properties and structural patterns. In addition, we should also carefully select appropriate molecular descriptors that can be used to accurately depict compound characteristics. The aim of this study was to conduct a comprehensive investigation into the prediction of DILI. First, we conducted a comparative analysis of the physicochemical properties of extensively well‐prepared DILI‐positive and DILI‐negative compounds. Then, we used classic substructure dissection methods to identify structural pattern differences between these two different types of chemical molecules. These findings indicate that it is not feasible to establish property or substructure‐based rules for distinguishing between DILI‐positive and DILI‐negative compounds. Finally, we developed quantitative classification models for predicting DILI using the naïve Bayes classifier (NBC) and recursive partitioning (RP) machine learning techniques. The optimal DILI prediction model was obtained using NBC, which combines 21 physicochemical properties, the VolSurf descriptors and the LCFP_10 fingerprint set. This model achieved a global accuracy (GA) of 0.855 and an area under the curve (AUC) of 0.704 for the training set, while the corresponding values were 0.619 and 0.674 for the test set, respectively. Moreover, indicative substructural fragments favorable or unfavorable for DILI were identified from the best naïve Bayesian classification model. These findings may help prioritize lead compounds in the early stage of drug development pipelines.
Funder
Priority Academic Program Development of Jiangsu Higher Education Institutions