Affiliation:
1. Computer Science Department, Kent State University, 233 Math and Computer Science Bldg., Kent, Ohio 44242, United States
Abstract
The latency of broadcast/reduction operations has a significant impact on the performance of SIMD processors. This is especially true for associative programs, which make extensive use of global search operations. Previously, we developed a prototype associative SIMD processor that uses hardware multithreading to overcome the broadcast/reduction latency. In this paper we show, through simulations of the processor running an associative program, that hardware multithreading is able to improve performance by increasing system utilization, even for processors with hundreds or thousands of processing elements. However, the choice of thread scheduling policy used by the hardware is critical in determining the actual utilization achieved. We consider three thread scheduling policies and show that a thread scheduler that avoids issuing threads that will stall due to pipeline dependencies or thread synchronization operations is able to maintain system utilization independent of the number of threads.
Publisher
World Scientific Pub Co Pte Lt
Subject
Hardware and Architecture,Theoretical Computer Science,Software
Reference7 articles.
1. Associative processors and memories: a survey
2. Issues in the design of high performance SIMD architectures
3. ASC: an associative-computing paradigm
4. Behrooz Parhami, Associative Processing and Processors, eds. Anargyros Krikelis and Charles C. Weems (IEEE Computer Society Press, 1997) pp. 10–25.