A performance model for the communication in fast multipole methods on high-performance computing platforms-Reference-Cited by-同舟云学术

A performance model for the communication in fast multipole methods on high-performance computing platforms

Published:2016-07-27 Issue:4 Volume:30 Page:423-437
ISSN:1094-3420
Container-title:The International Journal of High Performance Computing Applications
language:en
Short-container-title:The International Journal of High Performance Computing Applications

Author:

Ibeid Huda¹,Yokota Rio¹,Keyes David¹

Affiliation:

1. Division of Computer, Electrical and Mathematical Sciences and Engineering King Abdullah University of Science and Technology, Thuwal, Saudi Arabia

Abstract

Exascale systems are predicted to have approximately 1 billion cores, assuming gigahertz cores. Limitations on affordable network topologies for distributed memory systems of such massive scale bring new challenges to the currently dominant parallel programing model. Currently, there are many efforts to evaluate the hardware and software bottlenecks of exascale designs. It is therefore of interest to model application performance and to understand what changes need to be made to ensure extrapolated scalability. The fast multipole method (FMM) was originally developed for accelerating N-body problems in astrophysics and molecular dynamics but has recently been extended to a wider range of problems. Its high arithmetic intensity combined with its linear complexity and asynchronous communication patterns make it a promising algorithm for exascale systems. In this paper, we discuss the challenges for FMM on current parallel computers and future exascale architectures, with a focus on internode communication. We focus on the communication part only; the efficiency of the computational kernels are beyond the scope of the present study. We develop a performance model that considers the communication patterns of the FMM and observe a good match between our model and the actual communication time on four high-performance computing (HPC) systems, when latency, bandwidth, network topology, and multicore penalties are all taken into account. To our knowledge, this is the first formal characterization of internode communication in FMM that validates the model against actual measurements of communication time. The ultimate communication model is predictive in an absolute sense; however, on complex systems, this objective is often out of reach or of a difficulty out of proportion to its benefit when there exists a simpler model that is inexpensive and sufficient to guide coding decisions leading to improved scaling. The current model provides such guidance.

Publisher

SAGE Publications

Subject

Hardware and Architecture,Theoretical Computer Science,Software

Link

http://journals.sagepub.com/doi/pdf/10.1177/1094342016634819

Reference32 articles.

1. A hierarchical O(N log N) force-calculation algorithm

2. Diagnosis, Tuning, and Redesign for Multicore Performance: A Case Study of the Fast Multipole Method

3. Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures

4. A Fast Adaptive Multipole Algorithm in Three Dimensions

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Calculating molecular interactions;Molecular Simulation of Fluids;2024

2. FFT, FMM, and multigrid on the road to exascale: Performance challenges and opportunities;Journal of Parallel and Distributed Computing;2020-02

3. Learning with Analytical Models;2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW);2019-05

4. Low communication FMM-accelerated FFT on GPUs;Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis;2017-11-12

5. Fast multipole preconditioners for sparse matrices arising from elliptic equations;Computing and Visualization in Science;2017-11-09