Affiliation:
1. Chalmers University of Technology, Gothenburg, Sweden
Abstract
Low utilization of on-chip cache capacity limits performance and wastes energy because of the long latency, limited bandwidth, and energy consumption associated with off-chip memory accesses. Value replication is an important source of low capacity utilization. While prior cache compression techniques manage to code frequent values densely, they trade off a high compression ratio for low decompression latency, thus missing opportunities to utilize capacity more effectively.
This paper presents, for the first time, a detailed designspace exploration of caches that utilize statistical compression. We show that more aggressive approaches like Huffman coding, which have been neglected in the past due to the high processing overhead for (de)compression, are suitable techniques for caches and memory. Based on our key observation that value locality varies little over time and across applications, we first demonstrate that the overhead of statistics acquisition for code generation is low because new encodings are needed rarely, making it possible to off-load it to software routines. We then show that the high compression ratio obtained by Huffman-coding makes it possible to utilize the performance benefits of 4X larger last-level caches with about 50% lower power consumption than such larger caches
Funder
Vetenskapsrädet
Swedish Foundation for Strategic Research
Publisher
Association for Computing Machinery (ACM)
Cited by
21 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Selective Memory Compression for GPU Memory Oversubscription Management;Proceedings of the 53rd International Conference on Parallel Processing;2024-08-12
2. Dictionary Based Cache Line Compression;Proceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems;2024-07-08
3. Beyond Compression Ratio: A Throughput Analysis of Memory Compression Techniques for GPUs;2023 IEEE 41st International Conference on Computer Design (ICCD);2023-11-06
4. DaeMon: Architectural Support for Efficient Data Movement in Fully Disaggregated Systems;Proceedings of the ACM on Measurement and Analysis of Computing Systems;2023-02-27
5. Multi-Prediction Compression: An Efficient and Scalable Memory Compression Framework for GP-GPU;IEEE Computer Architecture Letters;2022-07-01