Affiliation:
1. College of Material and Chemical Engineering, Tongren University, Tongren 554300, PR China
2. School of Physics and Optoelectronic Engineering, Yangtze University, Jingzhou 434023, PR China
Abstract
Abstract
Chemists have been pursuing the general mathematical laws to explain and predict
molecular properties for a long time. However, most of the traditional quantitative
structure-activity relationship (QSAR) models have limited application domains, e.g.,
they tend to have poor generalization performance when applied to molecules with
parent structures different from those of the trained molecules. This paper attempts to
develop a new QSAR method that could theoretically predict various properties of
molecules with diverse structures. The proposed deep electron cloud-activity
relationships (DECAR) and deep field-activity relationships (DFAR) methods consist
of three essentials: (1) A large number of molecule entities with activity data as
training objects and responses; (2) three-dimensional electron cloud density (ECD) or
related field data by the accurate density functional theory methods as input
descriptors; (3) a deep learning model that is sufficiently flexible and powerful to
learn the large data described above. DECAR and DFAR are used to distinguish 977
sweet and 1965 non-sweet molecules (with 6-fold data augmentation) and the
classification performance is demonstrated to be significantly better than the
traditional least squares support vector machine (LS-SVM) models using traditional
descriptors. DECAR and DFAR would provide a feasible and promising way to
establish a widely applicable, cumulative, and shareable artificial intelligence-driven
QSAR system. They will promote the development of an interactive platform to
collect and share the accurate ECD and field data of millions of molecules with
annotated activities. With enough input data, we envision the appearance of hundreds
of deep networks trained for various molecular activities. Finally, we could anticipate
a single DECAR or DFAR network to learn and infer various properties of interest for
chemical molecules, which will become an open and shared learning and inference
tool for chemists.
Publisher
Research Square Platform LLC