Efficient execution of memory access phases using dataflow specialization-Reference-Cited by-同舟云学术

Efficient execution of memory access phases using dataflow specialization

Published:2016-01-04 Issue:3S Volume:43 Page:118-130
ISSN:0163-5964
Container-title:ACM SIGARCH Computer Architecture News
language:en
Short-container-title:SIGARCH Comput. Archit. News

Author:

Ho Chen-Han¹,Kim Sung Jin¹,Sankaralingam Karthikeyan¹

Affiliation:

1. University of Wisconsin-Madison

Abstract

This paper identifies a new opportunity for improving the efficiency of a processor core: memory access phases of programs . These are dynamic regions of programs where most of the instructions are devoted to memory access or address computation. These occur naturally in programs because of workload properties, or when employing an in-core accelerator, we get induced phases where the code execution on the core is access code. We observe such code requires an OOO core's dataflow and dynamism to run fast and does not execute well on an in-order processor. However, an OOO core consumes much power, effectively increasing energy consumption and reducing the energy efficiency of in-core accelerators. We develop an execution model called memory access dataflow (MAD) that encodes dataflow computation, event-condition-action rules, and explicit actions. Using it we build a specialized engine that provides an OOO core's performance but at a fraction of the power. Such an engine can serve as a general way for any accelerator to execute its respective induced phase, thus providing a common interface and implementation for current and future accelerators. We have designed and implemented MAD in RTL, and we demonstrate its generality and flexibility by integration with four diverse accelerators (SSE, DySER, NPU, and C-Cores). Our quantitative results show, relative to in-order, 2-wide OOO, and 4-wide OOO, MAD provides 2.4×, 1.4× and equivalent performance respectively. It provides 0.8×, 0.6× and 0.4× lower energy.

Funder

National Science Foundation

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/2872887.2750390

Reference72 articles.

1. "Intel's Sandy Bridge Microarchitecture " http://www.realworldtech.com/sandy-bridge/ accessed: 2014-08-14. "Intel's Sandy Bridge Microarchitecture " http://www.realworldtech.com/sandy-bridge/ accessed: 2014-08-14.

2. Parboil Benchmark Suite. http://impact.crhc.illinois.edu/parboil.php. Parboil Benchmark Suite. http://impact.crhc.illinois.edu/parboil.php.

3. "Silvermont Intel's Low Power Architecture " http://www.realworldtech.com/silvermont/ accessed: 2014-08-14. "Silvermont Intel's Low Power Architecture " http://www.realworldtech.com/silvermont/ accessed: 2014-08-14.

4. M. Annavaram J. M. Patel and E. S. Davidson "Data prefetching by dependence graph precomputation " in ISCA '01. 10.1145/379240.379251 M. Annavaram J. M. Patel and E. S. Davidson "Data prefetching by dependence graph precomputation " in ISCA '01. 10.1145/379240.379251

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Optimizing Data Availability and Utilization in Deep Learning Accelerator SoCs;2023 30th IEEE International Conference on Electronics, Circuits and Systems (ICECS);2023-12-04

2. Programming Model;Software Defined Chips;2022-11-15