MemSpy: analyzing memory system bottlenecks in programs-Reference-Cited by-同舟云学术

MemSpy: analyzing memory system bottlenecks in programs

Published:1992-06 Issue:1 Volume:20 Page:1-12
ISSN:0163-5999
Container-title:ACM SIGMETRICS Performance Evaluation Review
language:en
Short-container-title:SIGMETRICS Perform. Eval. Rev.

Author:

Martonosi Margaret¹,Gupta Anoop¹,Anderson Thomas²

Affiliation:

1. Computer Systems Laboratory, Stanford University, CA

2. Computer Science Division, Univ. of California, Berkeley, CA

Abstract

To cope with the increasing difference between processor and main memory speeds, modern computer systems use deep memory hierarchies. In the presence of such hierarchies, the performance attained by an application is largely determined by its memory reference behavior—if most references hit in the cache, the performance is significantly higher than if most references have to go to main memory. Frequently, it is possible for the programmer to restructure the data or code to achieve better memory reference behavior. Unfortunately, most existing performance debugging tools do not assist the programmer in this component of the overall performance tuning task. This paper describes MemSpy, a prototype tool that helps programmers identify and fix memory bottlenecks in both sequential and parallel programs. A key aspect of MemSpy is that it introduces the notion of data oriented, in addition to code oriented, performance tuning. Thus, for both source level code objects and data objects, MemSpy provides information such as cache miss rates, causes of cache misses, and in multiprocessors, information on cache invalidations and local versus remote memory misses. MemSpy also introduces a concise matrix presentation to allow programmers to view both code and data oriented statistics at the same time. This paper presents design and implementation issues for MemSpy, and gives a detailed case study using MemSpy to tune a parallel sparse matrix application. It shows how MemSpy helps pinpoint memory system bottlenecks, such as poor spatial locality and interference among data structures, and suggests paths for improvement.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture,Software

Link

https://dl.acm.org/doi/pdf/10.1145/149439.133079

Reference15 articles.

1. Memory-reference characteristics of multiprocessor applications under MACH

2. Quartz: a tool for tuning parallel program performance

3. Non-intrusive and interactive profiling in parasight

4. A tool to aid in the design, implementation, and understanding of matrix algorithms for parallel processors

Cited by 28 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. VClinic: A Portable and Efficient Framework for Fine-Grained Value Profilers;Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2;2023-01-27

2. Using Analog Side Channels for Hardware Event Profiling;Understanding Analog Side Channels Using Cryptography Algorithms;2023

3. DRCCTPROF: A Fine-Grained Call Path Profiler for ARM-Based Clusters;SC20: International Conference for High Performance Computing, Networking, Storage and Analysis;2020-11

4. Understanding memory access patterns using the BSC performance tools;Parallel Computing;2018-10

5. Profile-guided scope-based data allocation method;Proceedings of the International Symposium on Memory Systems;2018-10