Affiliation:
1. Lawrence Livermore National Laboratory, Livermore, CA
Abstract
Performance analysis of communication activity for a terascale application with traditional message tracing can be overwhelming in terms of overhead, perturbation, and storage. We propose a novel alternative that enables dynamic statistical profiling of an application's communication activity using message sampling. We have implemented an operational prototype, named P
HOTON
, and our evidence shows that this new approach can provide an accurate, low-overhead, tractable alternative for performance analysis of communication activity. P
HOTON
consists of two components: a Message Passing Interface (MPI) profiling layer that implements sampling and analysis, and a modified MPI runtime that appends a small but necessary amount of information to individual messages. More importantly, this alternative enables an assortment of runtime analysis techniques so that, in contrast to post-mortem, trace-based techniques, the raw performance data can be jettisoned immediately after analysis. Our investigation shows that message sampling can reduce overhead to imperceptible levels for many applications. Experiments on several applications demonstrate the viability of this approach. For example, with one application, our technique reduced the analysis overhead from 154% for traditional tracing to 6% for statistical profiling. We also evaluate different sampling techniques in this framework. The coverage of the sample space provided by purely random sampling is superior to counter- and timer-based sampling. Also, P
HOTON
's design reveals that frugal modifications to the MPI runtime system could facilitate such techniques on production computing systems, and it suggests that this sampling technique could execute continuously for long-running applications.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Networks and Communications,Hardware and Architecture,Software
Cited by
15 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Graph-Centric Performance Analysis for Large-Scale Parallel Applications;IEEE Transactions on Parallel and Distributed Systems;2024-07
2. Production-Run Noise Detection;Performance Analysis of Parallel Applications for HPC;2023
3. Graph Analysis for Scalability Analysis;Performance Analysis of Parallel Applications for HPC;2023
4. Informed Memory Access Monitoring;Performance Analysis of Parallel Applications for HPC;2023
5. Locating and categorizing inefficient communication patterns in HPC systems using inter-process communication traces;Journal of Systems and Software;2022-12