Abstract
AbstractData science and machine learning in materials science require large datasets of technologically relevant molecules or materials. Currently, publicly available molecular datasets with realistic molecular geometries and spectral properties are rare. We here supply a diverse benchmark spectroscopy dataset of 61,489 molecules extracted from organic crystals in the Cambridge Structural Database (CSD), denoted OE62. Molecular equilibrium geometries are reported at the Perdew-Burke-Ernzerhof (PBE) level of density functional theory (DFT) including van der Waals corrections for all 62 k molecules. For these geometries, OE62 supplies total energies and orbital eigenvalues at the PBE and the PBE hybrid (PBE0) functional level of DFT for all 62 k molecules in vacuum as well as at the PBE0 level for a subset of 30,876 molecules in (implicit) water. For 5,239 molecules in vacuum, the dataset provides quasiparticle energies computed with many-body perturbation theory in the G0W0 approximation with a PBE0 starting point (denoted GW5000 in analogy to the GW100 benchmark set (M. van Setten et al. J. Chem. Theory Comput. 12, 5076 (2016))).
Funder
Magnus Ehrnroothin Säätiö
Suomen Kulttuurirahasto
Deutsche Forschungsgemeinschaft
Academy of Finland
EC | Horizon 2020 Framework Programme
Aalto Science-IT project CSC Grand Challenge project Artificial Intelligence in Physical Sciences and Engineering scheme
Solar Technologies go Hybrid
Leibniz Supercomputer Centre
Publisher
Springer Science and Business Media LLC
Subject
Library and Information Sciences,Statistics, Probability and Uncertainty,Computer Science Applications,Education,Information Systems,Statistics and Probability
Cited by
49 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献