The WaveScalar architecture-Reference-Cited by-同舟云学术

The WaveScalar architecture

Published:2007-05 Issue:2 Volume:25 Page:1-54
ISSN:0734-2071
Container-title:ACM Transactions on Computer Systems
language:en
Short-container-title:ACM Trans. Comput. Syst.

Author:

Swanson Steven¹,Schwerin Andrew¹,Mercaldi Martha¹,Petersen Andrew¹,Putnam Andrew¹,Michelson Ken¹,Oskin Mark¹,Eggers Susan J.¹

Affiliation:

1. University of Washington, Seattle, WA

Abstract

Silicon technology will continue to provide an exponential increase in the availability of raw transistors. Effectively translating this resource into application performance, however, is an open challenge that conventional superscalar designs will not be able to meet. We present WaveScalar as a scalable alternative to conventional designs. WaveScalar is a dataflow instruction set and execution model designed for scalable, low-complexity/high-performance processors. Unlike previous dataflow machines, WaveScalar can efficiently provide the sequential memory semantics that imperative languages require. To allow programmers to easily express parallelism, WaveScalar supports pthread-style, coarse-grain multithreading and dataflow-style, fine-grain threading. In addition, it permits blending the two styles within an application, or even a single function. To execute WaveScalar programs, we have designed a scalable, tile-based processor architecture called the WaveCache. As a program executes, the WaveCache maps the program's instructions onto its array of processing elements (PEs). The instructions remain at their processing elements for many invocations, and as the working set of instructions changes, the WaveCache removes unused instructions and maps new ones in their place. The instructions communicate directly with one another over a scalable, hierarchical on-chip interconnect, obviating the need for long wires and broadcast communication. This article presents the WaveScalar instruction set and evaluates a simulated implementation based on current technology. For single-threaded applications, the WaveCache achieves performance on par with conventional processors, but in less area. For coarse-grain threaded applications the WaveCache achieves nearly linear speedup with up to 64 threads and can sustain 7--14 multiply-accumulates per cycle on fine-grain threaded versions of well-known kernels. Finally, we apply both styles of threading to equake from Spec2000 and speed it up by 9x compared to the serial version.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/1233307.1233308

Reference54 articles.

1. Shared memory consistency models: a tutorial

2. Clock rate versus IPC

3. I-structures: data structures for parallel computing

4. Piranha

Cited by 81 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. DAP: A 507-GMACs/J 256-Core Domain Adaptive Processor for Wireless Communication and Linear Algebra Kernels in 12-nm FINFET;IEEE Journal of Solid-State Circuits;2024

2. Towards Efficient Control Flow Handling in Spatial Architecture via Architecting the Control Flow Plane;56th Annual IEEE/ACM International Symposium on Microarchitecture;2023-10-28

3. Accelerating RTL Simulation with Hardware-Software Co-Design;56th Annual IEEE/ACM International Symposium on Microarchitecture;2023-10-28

4. Consistency Constraints for Mapping Dataflow Graphs to Hybrid Dataflow/von Neumann Architectures;ACM Transactions on Embedded Computing Systems;2023-09-26

5. Allocation and Scheduling of Dataflow Graphs on Hybrid Dataflow/von Neumann Architectures;Proceedings of the 21st ACM-IEEE International Conference on Formal Methods and Models for System Design;2023-09-21