Affiliation:
1. College of Material and Chemical Engineering, Tongren University, Tongren 554300, PR China
2. School of Physics and Optoelectronic Engineering, Yangtze University, Jingzhou 434023, PR China
Abstract
Abstract
Chemists have been pursuing the general mathematical laws to explain and predict molecular properties for a long time. However, most of the traditional quantitative structure-activity relationship (QSAR) models have limited application domains, e.g., they tend to have poor generalization performance when applied to molecules with parent structures different from those of the trained molecules. This paper attempts to develop a new QSAR method that could theoretically predict various properties of molecules with diverse structures. The proposed deep electron cloud-activity relationships (DECAR) and deep field-activity relationships (DFAR) methods consist of three essentials: (1) A large number of molecule entities with activity data as training objects and responses; (2) three-dimensional electron cloud density (ECD) or related field data by the accurate density functional theory methods as input descriptors; (3) a deep learning model that is sufficiently flexible and powerful to learn the large data described above. DECAR and DFAR are used to distinguish 977 sweet and 1965 non-sweet molecules (with 6-fold data augmentation) and the classification performance is demonstrated to be significantly better than the traditional least squares support vector machine (LS-SVM) models using traditional descriptors. DECAR and DFAR would provide a feasible and promising way to establish a widely applicable, cumulative, and shareable artificial intelligence-driven QSAR system. They will promote the development of an interactive platform to collect and share the accurate ECD and field data of millions of molecules with annotated activities. With enough input data, we envision the appearance of hundreds of deep networks trained for various molecular activities. Finally, we could anticipate a single DECAR or DFAR network to learn and infer various properties of interest for chemical molecules, which will become an open and shared learning and inference tool for chemists.
Publisher
Research Square Platform LLC