Porting fragmentation methods to GPUs using an OpenMP API: Offloading the resolution-of-the-identity second-order Møller–Plesset perturbation method

Author:

Pham Buu Q.1ORCID,Carrington Laura2,Tiwari Ananta2,Leang Sarom S.2,Alkan Melisa1,Bertoni Colleen3ORCID,Datta Dipayan1ORCID,Sattasathuchana Tosaporn1ORCID,Xu Peng1ORCID,Gordon Mark S.1ORCID

Affiliation:

1. Department of Chemistry and Ames Laboratory, Iowa State University 1 , Ames, Iowa 50011, USA

2. EP Analytics 2 , San Diego, California 92131, USA

3. Leadership Computing Facility, Argonne National Laboratory 3 , Lemont, Illinois 60439, USA

Abstract

Using an OpenMP Application Programming Interface, the resolution-of-the-identity second-order Møller–Plesset perturbation (RI-MP2) method has been off-loaded onto graphical processing units (GPUs), both as a standalone method in the GAMESS electronic structure program and as an electron correlation energy component in the effective fragment molecular orbital (EFMO) framework. First, a new scheme has been proposed to maximize data digestion on GPUs that subsequently linearizes data transfer from central processing units (CPUs) to GPUs. Second, the GAMESS Fortran code has been interfaced with GPU numerical libraries (e.g., NVIDIA cuBLAS and cuSOLVER) for efficient matrix operations (e.g., matrix multiplication, matrix decomposition, and matrix inversion). The standalone GPU RI-MP2 code shows an increasing speedup of up to 7.5× using one NVIDIA V100 GPU with one IBM 42-core P9 CPU for calculations on fullerenes of increasing size from 40 to 260 carbon atoms using the 6-31G(d)/cc-pVDZ-RI basis sets. A single Summit node with six V100s can compute the RI-MP2 correlation energy of a cluster of 175 water molecules using the correlation consistent basis sets cc-pVDZ/cc-pVDZ-RI containing 4375 atomic orbitals and 14 700 auxiliary basis functions in ∼0.85 h. In the EFMO framework, the GPU RI-MP2 component shows near linear scaling for a large number of V100s when computing the energy of an 1800-atom mesoporous silica nanoparticle in a bath of 4000 water molecules. The parallel efficiencies of the GPU RI-MP2 component with 2304 and 4608 V100s are 98.0% and 96.1%, respectively.

Funder

Department of Energy

Publisher

AIP Publishing

Subject

Physical and Theoretical Chemistry,General Physics and Astronomy

Cited by 8 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3