Affiliation:
1. The University of Texas at Austin, Austin, TX
2. T. J. Watson Research Center, Yorktown Hieghts, NY
Abstract
Extracting high-performance from the emerging Chip Multiprocessors (CMPs) requires that the application be divided into multiple threads. Each thread executes on a separate core thereby increasing concurrency and improving performance. As the number of cores on a CMP continues to increase, the performance of some multi-threaded applications will benefit from the increased number of threads, whereas, the performance of other multi-threaded applications will become limited by data-synchronization and off-chip bandwidth. For applications that get limited by data-synchronization, increasing the number of threads significantly degrades performance and increases on-chip power. Similarly, for applications that get limited by off-chip bandwidth, increasing the number of threads increases on-chip power without providing any performance improvement. Furthermore, whether an application gets limited by data-synchronization, or bandwidth, or neither depends not only on the application but also on the input set and the machine configuration. Therefore, controlling the number of threads based on the run-time behavior of the application can significantly improve performance and reduce power.
This paper proposes
Feedback-Driven Threading (FDT)
, a framework to dynamically control the number of threads using run-time information. FDT can be used to implement
Synchronization-Aware Threading (SAT)
, which predicts the optimal number of threads depending on the amount of data-synchronization. Our evaluation shows that SAT can reduce both execution time and power by up to 66% and 78% respectively. Similarly, FDT can be used to implement
Bandwidth-Aware Threading (BAT)
, which predicts the minimum number of threads required to saturate the off-chip bus. Our evaluation shows that BAT reduces on-chip power by up to 78%. When SAT and BAT are combined, the average execution time reduces by 17% and power reduces by 59%. The proposed techniques leverage existing performance counters and require minimal support from the threading library.
Publisher
Association for Computing Machinery (ACM)
Cited by
23 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. An ANN-Guided Multi-Objective Framework for Power-Performance Balancing in HPC Systems;Proceedings of the 21st ACM International Conference on Computing Frontiers;2024-05-07
2. A neural network framework for optimizing parallel computing in cloud servers;Journal of Systems Architecture;2024-05
3. Synergistically Rebalancing the EDP of Container-Based Parallel Applications;IEEE Transactions on Parallel and Distributed Systems;2024-03
4. NeurOPar, A Neural Network-Driven EDP Optimization Strategy for Parallel Workloads;2023 IEEE 35th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD);2023-10-17
5. On the benefits of Collaborative Thread Throttling and HLS-Versioning in CPU-FPGA Environments;2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI);2022-08-22