On-chip traffic regulation to reduce coherence protocol cost on a microthreaded many-core architecture with distributed caches-Reference-Cited by-同舟云学术

On-chip traffic regulation to reduce coherence protocol cost on a microthreaded many-core architecture with distributed caches

Published:2014-03 Issue:3s Volume:13 Page:1-21
ISSN:1539-9087
Container-title:ACM Transactions on Embedded Computing Systems
language:en
Short-container-title:ACM Trans. Embed. Comput. Syst.

Author:

Yang Qiang¹,Fu Jian¹,Poss Raphael¹,Jesshope Chris¹

Affiliation:

1. University of Amsterdam, Amsterdam, Netherlands

Abstract

When hardware cache coherence scales to many cores on chip, over saturated traffic of the shared memory system may offset the benefit from massive hardware concurrency. In this article, we investigate the cost of a write-update protocol in terms of on-chip memory network traffic and its adverse effects on the system performance based on a multithreaded many-core architecture with distributed caches. We discuss possible software and hardware solutions to alleviate the network pressure. We find that in the context of massive concurrency, by introducing a write-merging buffer with 0.46% area overhead to each core, applications with good locality and concurrency are boosted up by 18.74% in performance on average. Other applications also benefit from this addition and even achieve a throughput increase of 5.93%. In addition, this improvement indicates that higher levels of concurrency per core can be exploited without impacting performance, thus tolerating latency better and giving higher processor efficiencies compared to other solutions.

Funder

China Scholarship Council

Seventh Framework Programme

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Software

Link

https://dl.acm.org/doi/pdf/10.1145/2567931

Reference37 articles.

1. Throughput-Effective On-Chip Networks for Manycore Accelerators

2. R. Bianchini T. J. Leblanc and J. Veenstra. 1994. Eliminating useless messages in write-update protocols on scalable multiprocessors. Tech. rep. University of Rochester. R. Bianchini T. J. Leblanc and J. Veenstra. 1994. Eliminating useless messages in write-update protocols on scalable multiprocessors. Tech. rep. University of Rochester.

3. Instruction Level Parallelism through Microthreading—A Scalable Approach to Chip Multiprocessors

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. DiSquawk;Proceedings of the 13th International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, Languages, and Tools;2016-08-29

2. Building a Java™ Virtual Machine for Non-Cache-Coherent Many-core Architectures;Proceedings of the 14th International Workshop on Java Technologies for Real-Time and Embedded Systems - JTRES '16;2016