Affiliation:
1. Computer Systems Laboratory, Stanford University, Stanford, CA
Abstract
A fundamental problem that any scalable multiprocessor must address is the ability to tolerate high latency memory operations. This paper explores the extent to which multiple hardware contexts per processor can help to mitigate the negative effects of high latency. In particular, we evaluate the performance of a directory-based cache coherent multiprocessor using memory reference traces obtained from three parallel applications. We explore the case where there are a small fixed number (2-4) of hardware contexts per processor and the context switch overhead is low. In contrast to previously proposed approaches, we also use a very simple context switch criterion, namely a cache miss or a write-hit to shared data. Our results show that the effectiveness of multiple contexts depends on the nature of the applications, the context switch overhead, and the inherent latency of the machine architecture. Given reasonably low overhead hardware context switches, we show that two or four contexts can achieve substantial performance gains over a single context. For one application, the processor utilization increased by about 46% with two contexts and by about 80% with four contexts.
Publisher
Association for Computing Machinery (ACM)
Cited by
8 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Multithreading Architecture;Synthesis Lectures on Computer Architecture;2013-01-15
2. Performance Modeling of Multithreaded Distributed Memory Architectures;Hardware Design and Petri Nets;2000
3. Exploring cache performance in multithreaded processors;Microprocessors and Microsystems;1997-07
4. The M-machine multicomputer;International Journal of Parallel Programming;1997-06
5. MSparc: A multithreaded sparc;Lecture Notes in Computer Science;1996