Revving up 13C NMR shielding predictions across chemical space: benchmarks for atoms-in-molecules kernel machine learning with new data for 134 kilo molecules-Reference-Cited by-同舟云学术

Revving up 13C NMR shielding predictions across chemical space: benchmarks for atoms-in-molecules kernel machine learning with new data for 134 kilo molecules

Published:2021-05-12 Issue:3 Volume:2 Page:035010
ISSN:2632-2153
Container-title:Machine Learning: Science and Technology
language:
Short-container-title:Mach. Learn.: Sci. Technol.

Author:

Gupta Amit,Chakraborty Sabyasachi,Ramakrishnan Raghunathan^ORCID

Abstract

Abstract The requirement for accelerated and quantitatively accurate screening of nuclear magnetic resonance spectra across the small molecules chemical compound space is two-fold: (1) a robust ‘local’ machine learning (ML) strategy capturing the effect of the neighborhood on an atom’s ‘near-sighted’ property—chemical shielding; (2) an accurate reference dataset generated with a state-of-the-art first-principles method for training. Herein we report the QM9-NMR dataset comprising isotropic shielding of over 0.8 million C atoms in 134k molecules of the QM9 dataset in gas and five common solvent phases. Using these data for training, we present benchmark results for the prediction transferability of kernel-ridge regression models with popular local descriptors. Our best model, trained on 100k samples, accurately predicts isotropic shielding of 50k ‘hold-out’ atoms with a mean error of less than 1.9 ppm. For the rapid prediction of new query molecules, the models were trained on geometries from an inexpensive theory. Furthermore, by using a Δ-ML strategy, we quench the error below 1.4 ppm. Finally, we test the transferability on non-trivial benchmark sets that include benchmark molecules comprising 10–17 heavy atoms and drugs.

Funder

Tata Institute of Fundamental Research

Publisher

IOP Publishing

Subject

Artificial Intelligence,Human-Computer Interaction,Software

Link

https://iopscience.iop.org/article/10.1088/2632-2153/abe347/pdf

Reference99 articles.

1. Ab Initio Methods for the Calculation of NMR Shielding and Indirect Spin−Spin Coupling Constants

2. NMR chemical shift data and ab initio shielding calculations: emerging tools for protein structure determination

3. Predicting 13C NMR Spectra by DFT Calculations