Feedback-driven threading

Author:

Suleman M. Aater1,Qureshi Moinuddin K.2,Patt Yale N.1

Affiliation:

1. The University of Texas at Austin, Austin, TX

2. T. J. Watson Research Center, Yorktown Hieghts, NY

Abstract

Extracting high-performance from the emerging Chip Multiprocessors (CMPs) requires that the application be divided into multiple threads. Each thread executes on a separate core thereby increasing concurrency and improving performance. As the number of cores on a CMP continues to increase, the performance of some multi-threaded applications will benefit from the increased number of threads, whereas, the performance of other multi-threaded applications will become limited by data-synchronization and off-chip bandwidth. For applications that get limited by data-synchronization, increasing the number of threads significantly degrades performance and increases on-chip power. Similarly, for applications that get limited by off-chip bandwidth, increasing the number of threads increases on-chip power without providing any performance improvement. Furthermore, whether an application gets limited by data-synchronization, or bandwidth, or neither depends not only on the application but also on the input set and the machine configuration. Therefore, controlling the number of threads based on the run-time behavior of the application can significantly improve performance and reduce power. This paper proposes Feedback-Driven Threading (FDT) , a framework to dynamically control the number of threads using run-time information. FDT can be used to implement Synchronization-Aware Threading (SAT) , which predicts the optimal number of threads depending on the amount of data-synchronization. Our evaluation shows that SAT can reduce both execution time and power by up to 66% and 78% respectively. Similarly, FDT can be used to implement Bandwidth-Aware Threading (BAT) , which predicts the minimum number of threads required to saturate the off-chip bus. Our evaluation shows that BAT reduces on-chip power by up to 78%. When SAT and BAT are combined, the average execution time reduces by 17% and power reduces by 59%. The proposed techniques leverage existing performance counters and require minimal support from the threading library.

Publisher

Association for Computing Machinery (ACM)

Cited by 23 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. An ANN-Guided Multi-Objective Framework for Power-Performance Balancing in HPC Systems;Proceedings of the 21st ACM International Conference on Computing Frontiers;2024-05-07

2. A neural network framework for optimizing parallel computing in cloud servers;Journal of Systems Architecture;2024-05

3. Synergistically Rebalancing the EDP of Container-Based Parallel Applications;IEEE Transactions on Parallel and Distributed Systems;2024-03

4. NeurOPar, A Neural Network-Driven EDP Optimization Strategy for Parallel Workloads;2023 IEEE 35th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD);2023-10-17

5. On the benefits of Collaborative Thread Throttling and HLS-Versioning in CPU-FPGA Environments;2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI);2022-08-22

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3