The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules-Reference-Cited by-同舟云学术

The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules

Published:2020-05-01 Issue:1 Volume:7 Page:
ISSN:2052-4463
Container-title:Scientific Data
language:en
Short-container-title:Sci Data

Author:

Smith Justin S.^ORCID,Zubatyuk Roman,Nebgen Benjamin,Lubbers Nicholas,Barros Kipton,Roitberg Adrian E.,Isayev Olexandr^ORCID,Tretiak Sergei^ORCID

Abstract

AbstractMaximum diversification of data is a central theme in building generalized and accurate machine learning (ML) models. In chemistry, ML has been used to develop models for predicting molecular properties, for example quantum mechanics (QM) calculated potential energy surfaces and atomic charge models. The ANI-1x and ANI-1ccx ML-based general-purpose potentials for organic molecules were developed through active learning; an automated data diversification process. Here, we describe the ANI-1x and ANI-1ccx data sets. To demonstrate data diversity, we visualize it with a dimensionality reduction scheme, and contrast against existing data sets. The ANI-1x data set contains multiple QM properties from 5 M density functional theory calculations, while the ANI-1ccx data set contains 500 k data points obtained with an accurate CCSD(T)/CBS extrapolation. Approximately 14 million CPU core-hours were expended to generate this data. Multiple QM calculated properties for the chemical elements C, H, N, and O are provided: energies, atomic forces, multipole moments, atomic charges, etc. We provide this data to the community to aid research and development of ML models for chemistry.

Funder

DOE | LDRD | Los Alamos National Laboratory

United States Department of Defense | United States Navy | Office of Naval Research

National Science Foundation

Publisher

Springer Science and Business Media LLC

Subject

Library and Information Sciences,Statistics, Probability and Uncertainty,Computer Science Applications,Education,Information Systems,Statistics and Probability

Link

http://www.nature.com/articles/s41597-020-0473-z.pdf

Reference79 articles.

1. Gandhi, D., Pinto, L. & Gupta, A. Learning to fly by crashing. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 3948–3955 (IEEE, 2017).

2. Settles, B. Active learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 18, 1–111 (2012).

3. Reker, D. & Schneider, G. Active-learning strategies in computer-assisted drug discovery, vol. 20 (Elsevier Current Trends, 2015).

4. Podryabinkin, E. V. & Shapeev, A. V. Active learning of linearly parametrized interatomic potentials. Computational Materials Science 140, 171–180 (2017).

5. Smith, J. S., Nebgen, B., Lubbers, N., Isayev, O. & Roitberg, A. E. Less is more: sampling chemical space with active learning. The Journal of Chemical Physics 148, 241733 (2018).

Cited by 105 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. The emergence of machine learning force fields in drug design;Medicinal Research Reviews;2024-01-03

2. Towards physics-informed explainable machine learning and causal models for materials research;Computational Materials Science;2024-01

3. TrIP─Transformer Interatomic Potential Predicts Realistic Energy Surface Using Physical Bias;Journal of Chemical Theory and Computation;2023-12-27

4. Transferable Machine Learning Interatomic Potential for Bond Dissociation Energy Prediction of Drug-like Molecules;Journal of Chemical Theory and Computation;2023-12-18

5. Minimal Peptoid Dynamics Inform Self-Assembly Propensity;The Journal of Physical Chemistry B;2023-12-01