ColabFit exchange: Open-access datasets for data-driven interatomic potentials

Author:

Vita Joshua A.1ORCID,Fuemmeler Eric G.2ORCID,Gupta Amit2ORCID,Wolfe Gregory P.3ORCID,Tao Alexander Quanming2ORCID,Elliott Ryan S.2ORCID,Martiniani Stefano345ORCID,Tadmor Ellad B.2ORCID

Affiliation:

1. Department of Materials Science and Engineering, University of Illinois Urbana-Champaign 1 , Urbana, Illinois 61801, USA

2. Department of Aerospace Engineering and Mechanics, University of Minnesota 2 , Minneapolis, Minnesota 55455, USA

3. Center for Soft Matter Research, Department of Physics, New York University 3 , New York, New York 10012, USA

4. Simons Center for Computational Physical Chemistry, Department of Chemistry, New York University 4 , New York, New York 10012, USA

5. Courant Institute of Mathematical Sciences, New York University 5 , New York, New York 10112, USA

Abstract

Data-driven interatomic potentials (IPs) trained on large collections of first principles calculations are rapidly becoming essential tools in the fields of computational materials science and chemistry for performing atomic-scale simulations. Despite this, apart from a few notable exceptions, there is a distinct lack of well-organized, public datasets in common formats available for use with IP development. This deficiency precludes the research community from implementing widespread benchmarking, which is essential for gaining insight into model performance and transferability, and also limits the development of more general, or even universal, IPs. To address this issue, we introduce the ColabFit Exchange, the first database providing open access to a large collection of systematically organized datasets from multiple domains that is especially designed for IP development. The ColabFit Exchange is publicly available at https://colabfit.org, providing a web-based interface for exploring, downloading, and contributing datasets. Composed of data collected from the literature or provided by community researchers, the ColabFit Exchange currently (September 2023) consists of 139 datasets spanning nearly 70 000 unique chemistries, and is intended to continuously grow. In addition to outlining the software framework used for constructing and accessing the ColabFit Exchange, we also provide analyses of the data, quantifying the diversity of the database and proposing metrics for assessing the relative diversity of multiple datasets. Finally, we demonstrate an end-to-end IP development pipeline, utilizing datasets from the ColabFit Exchange, fitting tools from the KLIFF software package, and validation tests provided by the OpenKIM framework.

Funder

National Science Foundation

Simons Center for Computational Physical Chemistry

Minnesota Supercomputing Institute, University of Minnesota

NYU IT High Performance Computing

Publisher

AIP Publishing

Subject

Physical and Theoretical Chemistry,General Physics and Astronomy

Cited by 3 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3