Affiliation:
1. Department of Computer Science and Engineering, University of Washington, Seattle WA
Abstract
Initial implementations of parallel programs typically yield disappointing performance. Tuning to improve performance is thus a significant part of the parallel programming process. The effort required to tune a parallel program, and the level of performance that eventually is achieved, both depend heavily on the quality of the instrumentation that is available to the programmer.
This paper describes Quartz, a new tool for tuning parallel program performance on shared memory multiprocessors. The philosophy underlying Quartz was inspired by that of the sequential UNIX tool gprof: to appropriately direct the attention of the programmer by efficiently measuring just those factors that are most responsible for performance and by relating these metrics to one another and to the structure of the program. This philosophy is even more important in the parallel domain than in the sequential domain, because of the dramatically greater number of possible metrics and the dramatically increased complexity of program structures.
The principal metric of Quartz is
normalized processor time
: the total processor time spent in each section of code divided by the number of other processors that are concurrently busy when that section of code is being executed. Tied to the logical structure of the program, this metric provides a “smoking gun” pointing towards those areas of the program most responsible for poor performance. This information can be acquired efficiently by checkpointing to memory the number of busy processors and the state of each processor, and then statistically sampling these using a dedicated processor.
In addition to describing the design rationale, functionality, and implementation of Quartz, the paper examines how Quartz would be used to solve a number of performance problems that have been reported as being frequently encountered, and describes a case study in which Quartz was used to significantly improve the performance of a CAD circuit verifier.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Networks and Communications,Hardware and Architecture,Software
Cited by
31 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Distributed Latency Profiling through Critical Path Tracing;Communications of the ACM;2022-12-20
2. LIBNVCD: An Extendable and User-friendly Multi-GPU Performance Measurement Tool;2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC);2022-06
3. Assessment of effective patching material for concrete bridge deck -A review;Construction and Building Materials;2021-07
4. Diogenes;Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis;2019-11-17
5. DProf: distributed profiler with strong guarantees;Proceedings of the ACM on Programming Languages;2019-10-10