Identifying the Root Causes of Wait States in Large-Scale Parallel Applications-Reference-Cited by-同舟云学术

Identifying the Root Causes of Wait States in Large-Scale Parallel Applications

Published:2016-08-08 Issue:2 Volume:3 Page:1-24
ISSN:2329-4949
Container-title:ACM Transactions on Parallel Computing
language:en
Short-container-title:ACM Trans. Parallel Comput.

Author:

Böhme David¹,Geimer Markus²,Arnold Lukas²,Voigtlaender Felix³,Wolf Felix⁴

Affiliation:

1. Lawrence Livermore National Laboratory, USA

2. Jülich Supercomputing Centre, Germany

3. RWTH Aachen University, Germany

4. Technische Universität Darmstadt, Germany

Abstract

Driven by growing application requirements and accelerated by current trends in microprocessor design, the number of processor cores on modern supercomputers is increasing from generation to generation. However, load or communication imbalance prevents many codes from taking advantage of the available parallelism, as delays of single processes may spread wait states across the entire machine. Moreover, when employing complex point-to-point communication patterns, wait states may propagate along far-reaching cause-effect chains that are hard to track manually and that complicate an assessment of the actual costs of an imbalance. Building on earlier work by Meira, Jr., et al., we present a scalable approach that identifies program wait states and attributes their costs in terms of resource waste to their original cause. By replaying event traces in parallel both forward and backward, we can identify the processes and call paths responsible for the most severe imbalances, even for runs with hundreds of thousands of processes.

Funder

G8 Research Councils Initiative on Multilateral Research

Deutsche Forschungsgemeinschaft

U.S. Department of Energy by Lawrence Livermore National Laboratory

Interdisciplinary Program on Application Software towards Exascale Computing for Global Scale Issues is gratefully acknowledged

Helmholtz Association of German Research Centers

Publisher

Association for Computing Machinery (ACM)

Subject

Computational Theory and Mathematics,Computer Science Applications,Hardware and Architecture,Modeling and Simulation,Software

Link

https://dl.acm.org/doi/pdf/10.1145/2934661

Reference29 articles.

1. Accelerated Strategic Computing Initiative. 1995. The ASCI SWEEP3D Benchmark Code. (1995). http://www.ccs3.lanl.gov/pal/software/sweep3d/sweep3d_readme.html. Accelerated Strategic Computing Initiative. 1995. The ASCI SWEEP3D Benchmark Code. (1995). http://www.ccs3.lanl.gov/pal/software/sweep3d/sweep3d_readme.html.

2. Laksono Adhianto Sinchan Banerjee Michael W. Fagan Mark Krentel Gabriel Marin John Mellor-Crummey and Nathan R. Tallent. 2010. HPCToolkit: Tools for performance analysis of optimized parallel programs. Concurrency and Computation: Practice and Experience (April 2010). 10.1002/cpe.v22:6 Laksono Adhianto Sinchan Banerjee Michael W. Fagan Mark Krentel Gabriel Marin John Mellor-Crummey and Nathan R. Tallent. 2010. HPCToolkit: Tools for performance analysis of optimized parallel programs. Concurrency and Computation: Practice and Experience (April 2010). 10.1002/cpe.v22:6

3. Scalable timestamp synchronization for event traces of message-passing applications

4. Scalable Critical-Path Based Performance Analysis

5. Identifying the Root Causes of Wait States in Large-Scale Parallel Applications

Cited by 15 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Making applications faster by asynchronous execution: Slowing down processes or relaxing MPI collectives;Future Generation Computer Systems;2023-11

2. LatenSeer;Proceedings of the 2023 ACM Symposium on Cloud Computing;2023-10-30

3. The Role of Idle Waves, Desynchronization, and Bottleneck Evasion in the Performance of Parallel Programs;IEEE Transactions on Parallel and Distributed Systems;2023-02-01

4. Federated-ANN based Critical Path Analysis and Health Recommendations for MapReduce Workflows in Consumer Electronics Applications;IEEE Transactions on Consumer Electronics;2023

5. Locating and categorizing inefficient communication patterns in HPC systems using inter-process communication traces;Journal of Systems and Software;2022-12