An Analytical Cache Performance Evaluation Framework for Embedded Out-of-Order Processors Using Software Characteristics-Reference-Cited by-同舟云学术

An Analytical Cache Performance Evaluation Framework for Embedded Out-of-Order Processors Using Software Characteristics

Published:2018-08-29 Issue:4 Volume:17 Page:1-25
ISSN:1539-9087
Container-title:ACM Transactions on Embedded Computing Systems
language:en
Short-container-title:ACM Trans. Embed. Comput. Syst.

Author:

Ji Kecheng¹,Ling Ming¹,Shi Longxing²,Pan Jianping³

Affiliation:

1. Southeast University, Nanjing, China

2. Southeast University, Nanjing China

3. University of Victoria, Canada

Abstract

Utilizing analytical models to evaluate proposals or provide guidance in high-level architecture decisions is been becoming more and more attractive. A certain number of methods have emerged regarding cache behaviors and quantified insights in the last decade, such as the stack distance theory and the memory level parallelism (MLP) estimations. However, prior research normally oversimplified the factors that need to be considered in out-of-order processors, such as the effects triggered by reordered memory instructions, and multiple dependences among memory instructions, along with the merged accesses in the same MSHR entry. These ignored influences actually result in low and unstable precisions of recent analytical models. By quantifying the aforementioned effects, this article proposes a cache performance evaluation framework equipped with three analytical models, which can more accurately predict cache misses, MLPs, and the average cache miss service time, respectively. Similar to prior studies, these analytical models are all fed with profiled software characteristics in which case the architecture evaluation process can be accelerated significantly when compared with cycle-accurate simulations. We evaluate the accuracy of proposed models compared with gem5 cycle-accurate simulations with 16 benchmarks chosen from Mobybench Suite 2.0, Mibench 1.0, and Mediabench II. The average root mean square errors for predicting cache misses, MLPs, and the average cache miss service time are around 4%, 5%, and 8%, respectively. Meanwhile, the average error of predicting the stall time due to cache misses by our framework is as low as 8%. The whole cache performance estimation can be sped by about 15 times versus gem5 cycle-accurate simulations and 4 times when compared with recent studies. Furthermore, we have shown and studied the insights between different performance metrics and the reorder buffer sizes by using our models. As an application case of the framework, we also demonstrate how to use our framework combined with McPAT to find out Pareto optimal configurations for cache design space explorations.

Funder

Chinese National Mega Project of Scientific Research

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3233182

Reference45 articles.

1. Out-of-order processor with a memory subsystem which handles speculatively dispatched load operations. (May 12, 1998);Abramson Jeffrey M.;US Patent,1998

2. Instruction-Cache Locking for Improving Embedded Systems Performance

3. A statistical multiprocessor cache model

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Load Balanced Content Prefetching Model for MANET-CLOUD Environment;Lecture Notes in Electrical Engineering;2022

2. The Predictable Execution Model in Practice;ACM Transactions on Embedded Computing Systems;2021-07

3. A Locality Optimizer for Loop-dominated Applications Based on Reuse Distance Analysis;ACM Transactions on Design Automation of Electronic Systems;2020-10-12

4. Fast modeling L2 cache reuse distance histograms using combined locality information from software traces;Journal of Systems Architecture;2020-09

5. A Gaussian Set Sampling Model for Efficient Shared Cache Profiling on Multi-Cores;IEEE Access;2019