Affiliation:
1. University of Tennessee, Knoxville, TN, USA
2. Karlsruhe Institute of Technology, Karlsruhe, Germany
3. Oak Ridge National Laboratory, Oak Ridge, TN, USA
4. University of Manchester, Manchester, UK
Abstract
The methodology and standardization layer provided by the Performance Application Programming Interface (PAPI) has played a vital role in application profiling for almost two decades. It has enabled sophisticated performance analysis tool designers and performance-conscious scientists to gain insights into their applications by simply instrumenting their code using a handful of PAPI functions that “just work” across different hardware components. In the past, PAPI development had focused primarily on hardware-specific performance metrics. However, the rapidly increasing complexity of software infrastructure poses new measurement and analysis challenges for the developers of large-scale applications. In particular, acquiring information regarding the behavior of libraries and runtimes—used by scientific applications—requires low-level binary instrumentation, or APIs specific to each library and runtime. No uniform API for monitoring events that originate from inside the software stack has emerged. In this article, we present our efforts to extend PAPI’s role so that it becomes the de facto standard for exposing performance-critical events, which we refer to as software-defined events (SDEs), from different software layers. Upgrading PAPI with SDEs enables monitoring of both types of performance events—hardware- and software-related events—in a uniform way, through the same consistent PAPI. The goal of this article is threefold. First, we motivate the need for SDEs and describe our design decisions regarding the functionality we offer through PAPI’s new SDE interface. Second, we illustrate how SDEs can be utilized by different software packages, specifically, by showcasing their use in the numerical linear algebra library MAGMA-Sparse, the tensor algebra library TAMM that is part of the NWChem suite, and the compiler-based performance analysis tool Byfl. Third, we provide a performance analysis of the overhead that results from monitoring SDEs and discuss the trade-offs between overhead and functionality.
Funder
U.S. Department of Energy
Subject
Hardware and Architecture,Theoretical Computer Science,Software
Cited by
12 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Ginkgo - A math library designed to accelerate Exascale Computing Project science applications;The International Journal of High Performance Computing Applications;2024-08-20
2. Automated Data Analysis for Defining Performance Metrics from Raw Hardware Events;2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW);2024-05-27
3. Supporting RISC-V Performance Counters Through Linux Performance Analysis Tools;2023 IEEE 34th International Conference on Application-specific Systems, Architectures and Processors (ASAP);2023-07
4. BiRFIA: Selective Binary Rewriting for Function Interception on ARM;Proceedings of the 37th International Conference on Supercomputing;2023-06-21
5. VClinic: A Portable and Efficient Framework for Fine-Grained Value Profilers;Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2;2023-01-27