Abstract
An important consideration when developing a deep neural network (DNN) for the prediction of molecular properties is the representation of the chemical space. Herein we explore the effect of the representation on the performance of our DNN engineered to predict Fe K-edge X-ray absorption near-edge structure (XANES) spectra, and address the question: How important is the choice of representation for the local environment around an arbitrary Fe absorption site? Using two popular representations of chemical space—the Coulomb matrix (CM) and pair-distribution/radial distribution curve (RDC)—we investigate the effect that the choice of representation has on the performance of our DNN. While CM and RDC featurisation are demonstrably robust descriptors, it is possible to obtain a smaller mean squared error (MSE) between the target and estimated XANES spectra when using RDC featurisation, and converge to this state a) faster and b) using fewer data samples. This is advantageous for future extension of our DNN to other X-ray absorption edges, and for reoptimisation of our DNN to reproduce results from higher levels of theory. In the latter case, dataset sizes will be limited more strongly by the resource-intensive nature of the underlying theoretical calculations.
Funder
Engineering and Physical Sciences Research Council
Subject
Chemistry (miscellaneous),Analytical Chemistry,Organic Chemistry,Physical and Theoretical Chemistry,Molecular Medicine,Drug Discovery,Pharmaceutical Science
Cited by
17 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献