Comparing memory systems for chip multiprocessors

Author:

Leverich Jacob1,Arakida Hideho1,Solomatnikov Alex1,Firoozshahian Amin1,Horowitz Mark1,Kozyrakis Christos1

Affiliation:

1. Stanford University, Stanford, CA

Abstract

There are two basic models for the on-chip memory in CMP systems: hardware-managed coherent caches and software-managed streaming memory. This paper performs a direct comparison of the two modelsunder the same set of assumptions about technology, area, and computational capabilities. The goal is to quantify how and when they differ in terms of performance, energy consumption, bandwidth requirements, and latency tolerance for general-purpose CMPs. We demonstrate that for data-parallel applications, the cache-based and streaming models perform and scale equally well. For certain applications with little data reuse, streaming scales better due to better bandwidth use and macroscopic software prefetching. However, the introduction of techniques such as hardware prefetching and non-allocating stores to the cache-based model eliminates the streaming advantage. Overall, our results indicate that there is not sufficient advantage in building streaming memory systems where all on-chip memory structures are explicitly managed. On the other hand, we show that streaming at the programming model level is particularly beneficial, even with the cache-based model, as it enhances locality and creates opportunities for bandwidth optimizations. Moreover, we observe that stream programming is actually easier with the cache-based model because the hardware guarantees correct, best-effort execution even when the programmer cannot fully regularize an application's code.

Publisher

Association for Computing Machinery (ACM)

Cited by 30 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Partitioning and Data Mapping in Reconfigurable Cache and Scratchpad Memory--Based Architectures;ACM Transactions on Design Automation of Electronic Systems;2016-12-28

2. Runtime-Guided Management of Scratchpad Memories in Multicore Architectures;2015 International Conference on Parallel Architecture and Compilation (PACT);2015-10

3. Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures;Proceedings of the 42nd Annual International Symposium on Computer Architecture;2015-06-13

4. Single-Instruction Multiple-Data Execution;Synthesis Lectures on Computer Architecture;2015-05-27

5. Specific read only data management for memory hierarchy optimization;ACM SIGBED Review;2015-01-22

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3