Affiliation:
1. Argonne National Laboratory, Lemont, IL
2. Los Alamos National Laboratory, Los Alamos, New Mexico
3. Lawrence Berkeley National Laboratory, Berkeley, CA
Abstract
Supercomputing is evolving toward hybrid and accelerator-based architectures with millions of cores. The Hardware/Hybrid Accelerated Cosmology Code (HACC) framework exploits this diverse landscape at the largest scales of problem size, obtaining high scalability and sustained performance. Developed to satisfy the science requirements of cosmological surveys, HACC melds particle and grid methods using a novel algorithmic structure that flexibly maps across architectures, including CPU/GPU, multi/many-core, and Blue Gene systems. In this Research Highlight, we demonstrate the success of HACC on two very different machines, the CPU/GPU system Titan and the BG/Q systems Sequoia and Mira, attaining very high levels of scalable performance. We demonstrate strong and weak scaling on Titan, obtaining up to 99.2% parallel efficiency, evolving 1.1 trillion particles. On Sequoia, we reach 13.94 PFlops (69.2% of peak) and 90% parallel efficiency on 1,572,864 cores, with 3.6 trillion particles, the largest cosmological benchmark yet performed. HACC design concepts are applicable to several other supercomputer applications.
Publisher
Association for Computing Machinery (ACM)
Reference25 articles.
1. DARK MATTER HALO PROFILES OF MASSIVE CLUSTERS: THEORY VERSUS OBSERVATIONS
2. Couchman H.M.P. Thomas P.A. Pearce F.R. Hydra: An adaptive-mesh implementation of P 3M-SPH Astrophys. J. 452 797 (1995). Couchman H.M.P. Thomas P.A. Pearce F.R. Hydra: An adaptive-mesh implementation of P 3M-SPH Astrophys. J. 452 797 (1995).
3. a review of cosmological simulation methods, see also Dolag, K., Borgani, S., Schindler, S., Diaferio, A., Bykov;For;A.M. Space Sci. Rev.,2008
4. FLASH: An Adaptive Mesh Hydrodynamics Code for Modeling Astrophysical Thermonuclear Flashes
Cited by
52 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. At the Locus of Performance: Quantifying the Effects of Copious 3D-Stacked Cache on HPC Workloads;ACM Transactions on Architecture and Code Optimization;2023-12-14
2. A Performance-Portable SYCL Implementation of CRK-HACC for Exascale;Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis;2023-11-12
3. Frontier: Exploring Exascale;Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis;2023-11-11
4. Experiences readying applications for Exascale;Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis;2023-11-11
5. FZ-GPU: A Fast and High-Ratio Lossy Compressor for Scientific Computing Applications on GPUs;Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing;2023-08-07