Affiliation:
1. The University of Texas at Austin
2. University of Wisconsin-Madison
Abstract
This paper explores a new technique called
coherence decoupling
, which breaks a traditional cache coherence protocol into two protocols: a Speculative Cache Lookup (SCL) protocol and a safe, backing coherence protocol. The SCL protocol produces a speculative load value, typically from an invalid cache line, permitting the processor to compute with incoherent data. In parallel, the coherence protocol obtains the necessary coherence permissions and the correct value. Eventually, the speculative use of the incoherent data can be verified against the coherent data. Thus, coherence decoupling can greatly reduce --- if not eliminate --- the effects of false sharing. Furthermore, coherence decoupling can also reduce latencies incurred by true sharing. SCL protocols reduce those latencies by speculatively writing updates into invalid lines, thereby increasing the accuracy of speculation, without complicating the simple, underlying coherence protocol that guarantees correctness.The performance benefits of coherence decoupling are evaluated using a full-system simulator and a mix of commercial and scientific benchmarks. Our results show that 40% to 90% of all coherence misses can be speculated correctly, and therefore their latencies partially or fully hidden. This capability results in performance improvements ranging from 3% to over 16%, in most cases where the latencies of coherence misses have an effect on performance.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Graphics and Computer-Aided Design,Software
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Ghostwriter: A Cache Coherence Protocol for Error-Tolerant Applications;50th International Conference on Parallel Processing Workshop;2021-08-09
2. TC-Release++: An Efficient Timestamp-Based Coherence Protocol for Many-Core Architectures;IEEE Transactions on Parallel and Distributed Systems;2017-11-01
3. Exploiting Staleness for Approximating Loads on CMPs;2015 International Conference on Parallel Architecture and Compilation (PACT);2015-10
4. DeFT;ACM Transactions on Architecture and Code Optimization;2011-07
5. Evaluation of the implementation cost of cache coherence protocols using omniscient actions;Design Automation for Embedded Systems;2010-01-29