Affiliation:
1. University of Missouri-Kansas City
Abstract
Level one cache normally resides on a processor's critical path, which determines the clock frequency. Directmapped caches exhibit fast access time but poor hit rates compared with same sized set-associative caches due to nonuniform accesses to the cache sets, which generate more conflict misses in some sets while other sets are underutilized. We propose a technique to reduce the miss rate of direct mapped caches through balancing the accesses to cache sets. We increase the decoder length and thus reduce the accesses to heavily used sets without dynamically detecting the cache set usage information. We introduce a replacement policy to direct-mapped cache design and increase the access to the underutilized cache sets with the help of programmable decoders. On average, the proposed balanced cache, or BCache, achieves 64.5% and 37.8% miss rate reductions on all 26 SPEC2K benchmarks for the instruction and data caches, respectively. This translates into an average IPC improvement of 5.9%. The B-Cache consumes 10.5% more power per access but exhibits 2% total memory access related energy saving due to the miss rate reductions and hence the reduction to applications' execution time. Compared with previous techniques that aim at reducing the miss rate of direct-mapped caches, our technique requires only one cycle to access all cache hits and has the same access time of a direct-mapped cache.
Publisher
Association for Computing Machinery (ACM)
Reference29 articles.
1. Column-associative caches
2. {4} Cadence Corporation http://www.cadence.com {4} Cadence Corporation http://www.cadence.com
3. LRU-based column-associative caches
Cited by
18 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献