Feedback-driven threading-Reference-Cited by-同舟云学术

Feedback-driven threading

Published:2008-03-25 Issue:1 Volume:36 Page:277-286
ISSN:0163-5964
Container-title:ACM SIGARCH Computer Architecture News
language:en
Short-container-title:SIGARCH Comput. Archit. News

Author:

Suleman M. Aater¹,Qureshi Moinuddin K.²,Patt Yale N.¹

Affiliation:

1. The University of Texas at Austin, Austin, TX

2. T. J. Watson Research Center, Yorktown Hieghts, NY

Abstract

Extracting high-performance from the emerging Chip Multiprocessors (CMPs) requires that the application be divided into multiple threads. Each thread executes on a separate core thereby increasing concurrency and improving performance. As the number of cores on a CMP continues to increase, the performance of some multi-threaded applications will benefit from the increased number of threads, whereas, the performance of other multi-threaded applications will become limited by data-synchronization and off-chip bandwidth. For applications that get limited by data-synchronization, increasing the number of threads significantly degrades performance and increases on-chip power. Similarly, for applications that get limited by off-chip bandwidth, increasing the number of threads increases on-chip power without providing any performance improvement. Furthermore, whether an application gets limited by data-synchronization, or bandwidth, or neither depends not only on the application but also on the input set and the machine configuration. Therefore, controlling the number of threads based on the run-time behavior of the application can significantly improve performance and reduce power. This paper proposes Feedback-Driven Threading (FDT) , a framework to dynamically control the number of threads using run-time information. FDT can be used to implement Synchronization-Aware Threading (SAT) , which predicts the optimal number of threads depending on the amount of data-synchronization. Our evaluation shows that SAT can reduce both execution time and power by up to 66% and 78% respectively. Similarly, FDT can be used to implement Bandwidth-Aware Threading (BAT) , which predicts the minimum number of threads required to saturate the off-chip bus. Our evaluation shows that BAT reduces on-chip power by up to 78%. When SAT and BAT are combined, the average execution time reduces by 17% and power reduces by 59%. The proposed techniques leverage existing performance counters and require minimal support from the threading library.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/1353534.1346317

Reference31 articles.

1. A Note on the Generation of Random Normal Deviates

2. Using parallel program characteristics in dynamic processor allocation policies

Cited by 23 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An ANN-Guided Multi-Objective Framework for Power-Performance Balancing in HPC Systems;Proceedings of the 21st ACM International Conference on Computing Frontiers;2024-05-07

2. A neural network framework for optimizing parallel computing in cloud servers;Journal of Systems Architecture;2024-05

3. Synergistically Rebalancing the EDP of Container-Based Parallel Applications;IEEE Transactions on Parallel and Distributed Systems;2024-03

4. NeurOPar, A Neural Network-Driven EDP Optimization Strategy for Parallel Workloads;2023 IEEE 35th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD);2023-10-17

5. On the benefits of Collaborative Thread Throttling and HLS-Versioning in CPU-FPGA Environments;2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI);2022-08-22