Scalable critical-path analysis and optimization guidance for hybrid MPI-CUDA applications-Reference-Cited by-同舟云学术

Scalable critical-path analysis and optimization guidance for hybrid MPI-CUDA applications

Published:2016-08-01 Issue:6 Volume:31 Page:485-498
ISSN:1094-3420
Container-title:The International Journal of High Performance Computing Applications
language:en
Short-container-title:The International Journal of High Performance Computing Applications

Author:

Schmitt Felix¹,Dietrich Robert¹,Juckeland Guido¹

Affiliation:

1. Center for Information Services and High Performance Computing (ZIH), Technische Universität Dresden, Germany

Abstract

The use of accelerators in heterogeneous systems is an established approach in designing petascale applications. Today, Compute Unified Device Architecture (CUDA) offers a rich programming interface for GPU accelerators but requires developers to incorporate several layers of parallelism on both the CPU and the GPU. From this increasing program complexity emerges the need for sophisticated performance tools. This work contributes by analyzing hybrid MPI-CUDA programs for properties based on wait states, such as the critical path, a metric proven to identify application bottlenecks effectively. We developed a tool to construct a dependency graph based on an execution trace and the inherent dependencies of the programming models CUDA and Message Passing Interface (MPI). Thereafter, it detects wait states and attributes blame to responsible activities. Together with the property of being on the critical path, we can identify activities that are most viable for optimization. To evaluate the global impact of optimizations to critical activities, we predict the program execution using a graph-based performance projection. The developed approach has been demonstrated with suitable examples to be both scalable and correct. Furthermore, we establish a new categorization of CUDA inefficiency patterns ensuing from the dependencies between CUDA activities.

Publisher

SAGE Publications

Subject

Hardware and Architecture,Theoretical Computer Science,Software

Link

http://journals.sagepub.com/doi/pdf/10.1177/1094342016661865

Reference18 articles.

1. Characterizing Load and Communication Imbalance in Large-Scale Parallel Applications

2. Non-intrusive Performance Analysis of Parallel Hardware Accelerated Applications on Hybrid Architectures

3. Accelerating linpack with CUDA on heterogenous clusters

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Graph-Centric Performance Analysis for Large-Scale Parallel Applications;IEEE Transactions on Parallel and Distributed Systems;2024-07

2. An Empirical Study of High Performance Computing (HPC) Performance Bugs;2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR);2023-05

3. Domain-Specific Framework for Performance Analysis;Performance Analysis of Parallel Applications for HPC;2023

4. Visualization of profiling and tracing in CPU‐GPU programs;Concurrency and Computation: Practice and Experience;2022-07-19

5. PerFlow;Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming;2022-03-28