Affiliation:
1. Micron Memory Japan, Inc., 7-10, Yoshikawa Kogyo Danchi, Higashi Hiroshima City, 739-0198, Japan
Abstract
In recent years, the increasing disparity between the data access speed of cache and processing speeds of processors has caused a major bottleneck in achieving high-performance 2-dimensional (2D) data processing, such as that in scientific computing and image processing. To solve this problem, this paper proposes new dual unit tile/line access cache memory based on a hierarchical hybrid Z-ordering data layout and multibank cache organization supporting skewed storage schemes. The proposed layout improves 2D data locality and reduces L1 cache misses and Translation Lookaside Buffer (TLB) misses efficiently and it is transformed from conventional raster layout by a simple hardware-based address translation unit. In addition, we proposed an aligned tile set replacement algorithm (ATSRA) for reduction of the hardware overhead in the tag memory of the proposed cache. Simulation results using Matrix Multiplication (MM) illustrated that the proposed cache with parallel unit tile/line accessibility can reduce both the L1 cache and TLB misses considerably as compared with conventional raster layout and Z-Morton order layout. The number of parallel load instructions for parallel unit tile/line access was reduced to only about one-fourth of the conventional load instruction. The execution time for parallel load instruction was reduced to about one-third of that required for conventional load instruction. By using 40 nm Complementary Metal-Oxide-Semiconductor (CMOS) technology, we combined the proposed cache with a SIMD-based data path and designed a 5 × 5 mm2 Large-Scale Integration (LSI) chip. The entire hardware overhead of the proposed ATSRA-cache was reduced to only 105% of that required for a conventional cache by using the ATSRA method.
Funder
Grant-in-Aid for Scientific Research
Subject
General Engineering,General Mathematics