Affiliation:
1. Sharif University of Technology
2. Sharif University of Technology and Institute for Research in Fundamental Science
Abstract
The energy consumption for processing modern workloads is challenging in data centers. Due to the large datasets of cloud workloads, the miss rate of the L1 data cache is high, and with respect to the energy efficiency concerns, such misses are costly for memory instructions because lower levels of memory hierarchy consume more energy per access than the L1. Moreover, large last-level caches are not performance effective, in contrast to traditional scientific workloads. The aim of this article is to propose a large L1 data cache, called Domino, to reduce the number of accesses to lower levels in order to improve the energy efficiency. In designing Domino, we focus on two components that use the on-chip area and are not energy efficient, which makes them good candidates to use their area for enlarging the L1 data cache. Domino is a highly associative cache that extends the conventional cache by borrowing the prefetcher and last-level-cache storage budget and using it as additional ways for data cache. In Domino, the additional ways are separated from the conventional cache ways; hence, the critical path of the first access is not altered. On a miss in the conventional part, it searches the added ways in a mix of parallel-sequential fashion to compromise the latency and energy consumption. Results on the Cloudsuite benchmark suite show that read and write misses are reduced by 30%, along with a 28% reduction in snoop messages. The overall energy consumption per access is then reduced by 20% on average (maximum 38%) as a result of filtering accesses to the lower levels.
Publisher
Association for Computing Machinery (ACM)
Subject
Electrical and Electronic Engineering,Computer Graphics and Computer-Aided Design,Computer Science Applications