Value-based clock gating and operation packing-Reference-Cited by-同舟云学术

Value-based clock gating and operation packing

Published:2000-05 Issue:2 Volume:18 Page:89-126
ISSN:0734-2071
Container-title:ACM Transactions on Computer Systems
language:en
Short-container-title:ACM Trans. Comput. Syst.

Author:

Brooks David¹,Martonosi Margaret¹

Affiliation:

1. Princeton Univ., Princeton, NJ

Abstract

The large address space needs of many current applications have pushed processor designs toward 64-bit word widths. Although full 64-bit addresses and operations are indeed sometimes needed, arithmetic operations on much smaller quantities are still more common. In fact, another instruction set trend has been the introduction of instructions geared toward subword operations on 16-bit quantities. For examples, most major processors now include instruction set support for multimedia operations allowing parallel execution of several subword operations in the same ALU. This article presents our observations demonstrating that operations on “narrow-width” quantities are common not only in multimedia codes, but also in more general workloads. In fact, across the SPECint95 benchmarks, over half the integer operation executions require 16 bits or less. Based on this data, we propose two hardware mechanisms that dynamically recognize and capitalize on these narrow-width operations. The first, power-oriented optimization reduces processor power consumption by using operand-value-based clock gating to turn off portions of arithmetic units that will be unused by narrow-width operations. This optimization results in a 45%-60% reduction in the integer unit's power consumption for the SPECint95 and MediaBench benchmark suites. Applying this optimization to SPECfp95 benchmarks results in slightly smaller power reductions, but still seems warranted. These reductions in integer unit power consumption equate to a 5%-10% full-chip power savings. Our second, performance-oriented optimization improves processor performance by packing together narrow-width operations so that they share a single arithmetic unit. Conceptually similar to a dynamic form of MMX, this optimization offers speedups of 4.3%-6.2% for SPECint95 and 8.0%-10.4% for MediaBench. Overall, these optimizations highlight an increasing opportunity for value-based optimizations to improve both power and performance in current microprocessors.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/350853.350856

Reference38 articles.

1. Precomputation-based sequential logic optimization for low power

2. Low power data processing by elimination of redundant computations

3. BHANDARKAR D. P. 1996. Alpha Implementations and Architecture: Complete Reference and Guide. Digital Press Newton MA. BHANDARKAR D. P. 1996. Alpha Implementations and Architecture: Complete Reference and Guide. Digital Press Newton MA.

Cited by 31 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Datawidth-Aware Energy-Efficient Multipliers: A Case for Going Sign Magnitude;2018 21st Euromicro Conference on Digital System Design (DSD);2018-08

2. References;Modeling and Optimization of Parallel and Distributed Embedded Systems;2016-01-08

3. Masking Soft Errors with Static Bitwise Analysis;ASIA PAC SOFWR ENG;2016

4. An Overview of Architecture-Level Power- and Energy-Efficient Design Techniques;Advances in Computers;2015

5. Characterizing and Exploiting Small-Value Memory Instructions;IEEE Transactions on Computers;2014-07