Efficient address remapping in distributed shared-memory systems
-
Published:2006-06
Issue:2
Volume:3
Page:209-229
-
ISSN:1544-3566
-
Container-title:ACM Transactions on Architecture and Code Optimization
-
language:en
-
Short-container-title:ACM Trans. Archit. Code Optim.
Author:
Zhang Lixin1,
Parker Mike2,
Carter John3
Affiliation:
1. IBM Austin Research Lab, Austin, TX
2. Cray Inc.
3. University of Utah, Salt Lake City, UT
Abstract
As processor performance continues to improve at a rate much higher than DRAM and network performance, we are approaching a time when large-scale distributed shared memory systems will have remote memory latencies measured in tens of thousands of processor cycles. The Impulse memory system architecture adds an optional level of address indirection at the memory controller. Applications can use this level of indirection to control how data is accessed and cached and thereby improve cache and bus utilization and reduce the number of memory accesses required. Previous Impulse work focuses on uniprocessor systems and relies on software to flush processor caches when necessary to ensure data coherence. In this paper, we investigate an extension of Impulse to multiprocessor systems that extends the coherence protocol to maintain data coherence without requiring software-directed cache flushing. Specifically, the multiprocessor Impulse controller can gather/scatter data across the network while its coherence protocol guarantees that each gather request gets coherent data and each scatter request updates every coherent replica in the system. Our simulation results demonstrate that the proposed system can significantly outperform conventional systems, achieving an average speedup of 9X on four memory-bound benchmarks on a 32-processor system.
Publisher
Association for Computing Machinery (ACM)
Subject
Hardware and Architecture,Information Systems,Software
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献