Abstract
This paper describes the synchronization and communication primitives of the Cray T3E multiprocessor, a shared memory system scalable to 2048 processors. We discuss what we have learned from the T3D project (the predecessor to the T3E) and the rationale behind changes made for the T3E. We include performance measurements for various aspects of communication and synchronization.The T3E augments the memory interface of the DEC 21164 microprocessor with a large set of explicitly-managed, external registers (E-registers). E-registers are used as the source or target for all remote communication. They provide a highly pipelined interface to global memory that allows dozens of requests per processor to be outstanding. Through E-registers, the T3E provides a rich set of atomic memory operations and a flexible, user-level messaging facility. The T3E also provides a set of virtual hardware barrier/eureka networks that can be arbitrarily embedded into the 3D torus interconnect.
Publisher
Association for Computing Machinery (ACM)
Reference45 articles.
1. Limits on interconnection network performance
2. The MIT Alewife machine
3. Empirical evaluation of the CRAY-T3D
4. Translation lookaside buffer consistency: a software approach
5. Bradley D. K. "First and Second Generation Hypercube Performance" Technical Report UIUCDCS-R- 88-1455 University of Illinois at Urbana-Champaign September 1988. Bradley D. K. "First and Second Generation Hypercube Performance" Technical Report UIUCDCS-R- 88-1455 University of Illinois at Urbana-Champaign September 1988.
Cited by
9 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献