Affiliation:
1. AICES, RWTH Aachen, Aachen, Germany
2. ICES, UT Austin, Austin, TX
Abstract
In addition to tensor contractions, one of the most pronounced computational bottlenecks in the nonorthogonally spin-adapted forms of the quantum chemistry methods CCSDT and CCSDTQ, and their approximate forms—including CCSD(T) and CCSDT(Q)—are spin summations. At a first sight, spin summations are operations similar to tensor transpositions, but a closer look reveals additional challenges to high-performance calculations, including temporal locality and scattered memory accesses. This article explores a sequence of algorithmic solutions for spin summations, each exploiting individual properties of either the underlying hardware (e.g., caches, vectorization) or the problem itself (e.g., factorizability). The final algorithm combines the advantages of all the solutions while avoiding their drawbacks; this algorithm achieves high performance through parallelization and vectorization, and by exploiting the temporal locality inherent to spin summations. Combined, these optimizations result in speedups between 2.4× and 5.5× over the NCC quantum chemistry software package. In addition to such a performance boost, our algorithm can perform the spin summations
in-place
, thus reducing the memory footprint by 2× over an
out-of-place
variant.
Funder
Intel Corporation through Parallel Computing Center grants to RWTH Aachen and UT Austin
Arnold and Mabel Beckman Foundation
Deutsche Forschungsgemeinschaft
Publisher
Association for Computing Machinery (ACM)
Subject
Applied Mathematics,Software
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献