Affiliation:
1. University of Wisconsin, Madison, Madison, WI, USA
2. Carnegie Mellon University, Pittsburgh, PA, USA
Abstract
We consider the problem of fine-grained hardware profiling, i.e., profiling the hardware while the desired section of the program is executing. Although this requirement is frequently encountered in practice, its importance has not been emphasized in literature so far. In this work, we compare and validate three tools for performing fine-grained profiling on Linux platforms - perf, PAPI, and a homegrown tool PMU-metrics. perf has been used in the past for fine-grained profiling in an erroneous manner, producing inaccurate metrics as a result. On the other hand, PAPI and PMU-metrics produce accurate metrics for profiling at thems-scale, while PMUmetrics enables profiling even at the µs-scale. Thus, we hope that our analysis will help systems practitioners choose the right tool for performing fine-grained profiling at different time scales.
Publisher
Association for Computing Machinery (ACM)
Reference16 articles.
1. BLARE codebase. github.com/mush-zhang/ Blare/tree/main/original_codebase.
2. clock gettime(3) -- Linux manual page. https://tinyurl.com/yyvkc2wz.
3. Counting CPU cycles with perf event in C. https://tinyurl.com/46azwvn6.
4. perf event source code. https://tinyurl.com/2bc557nj.
5. perf event open(2) -- Linux manual page. https://tinyurl.com/29f64vsm.