Architecture and design of AlphaServer GS320-Reference-Cited by-同舟云学术

Architecture and design of AlphaServer GS320

Published:2000-11 Issue:11 Volume:35 Page:13-24
ISSN:0362-1340
Container-title:ACM SIGPLAN Notices
language:en
Short-container-title:SIGPLAN Not.

Author:

Gharachorloo Kourosh¹,Sharma Madhu²,Steely Simon²,Van Doren Stephen²

Affiliation:

1. Western Research Laboratory, Compaq Computer Corporation, Palo Alto, California

2. High Performance Servers Division, Compaq Computer Corporation, Marlborough, Massachusetts

Abstract

This paper describes the architecture and implementation of the AlphaServer GS320, a cache-coherent non-uniform memory access multiprocessor developed at Compaq. The AlphaServer GS320 architecture is specifically targeted at medium-scale multiprocessing with 32 to 64 processors. Each node in the design consists of four Alpha 21264 processors, up to 32GB of coherent memory, and an aggressive IO subsystem. The current implementation supports up to 8 such nodes for a total of 32 processors. While snoopy-based designs have been stretched to medium-scale multiprocessors by some vendors, providing sufficient snoop bandwidth remains a major challenge especially in systems with aggressive processors. At the same time, directory protocols targeted at larger scale designs lead to a number of inherent inefficiencies relative to snoopy designs. A key goal of the AlphaServer GS320 architecture has been to achieve the best-of-both-worlds, partly by exploiting the bounded scale of the target systems.This paper focuses on the unique design features used in the AlphaServer GS320 to efficiently implement coherence and consistency. The guiding principle for our directory-based protocol is to address correctness issues related to rare protocol races without burdening the common transaction flows. Our protocol exhibits lower occupancy and lower message counts compared to previous designs, and provides more efficient handling of 3-hop transactions. Furthermore, our design naturally lends itself to elegant solutions for deadlock, livelock, starvation, and fairness. The AlphaServer GS320 architecture also incorporates a couple of innovative techniques that extend previous approaches for efficiently implementing memory consistency models. These techniques allow us to generate commit events (which are used for ordering purposes) well in advance of formulating the reply to a transaction. Furthermore, the separation of the commit event allows time-critical replies to bypass inbound requests without violating ordering properties. Even though our design specifically targets medium-scale servers, many of the same techniques can be applied to larger-scale directory-based and smaller-scale snoopy-based designs. Finally, we evaluate the performance impact of some of the above optimizations and present a few competitive benchmark results.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Graphics and Computer-Aided Design,Software

Link

https://dl.acm.org/doi/pdf/10.1145/356989.356991

Reference38 articles.

1. A lazy cache algorithm

2. Lazy caching

3. Piranha

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Photonic-based express coherence notifications for many-core CMPs;Journal of Parallel and Distributed Computing;2018-03

2. Exploring the relationship between architectures and management policies in the design of NUCA-based chip multicore systems;Future Generation Computer Systems;2018-01

3. PDR: A protocol for dynamic network reconfiguration based on deadlock recovery scheme;Simulation Modelling Practice and Theory;2012-05

4. Effcient Handling of Lock Hand-off in DSM Multiprocessors with Buffering Coherence Controllers;Journal of Computer Science and Technology;2012-01

5. Cache-Integrated Network Interfaces: Flexible On-Chip Communication and Synchronization for Large-Scale CMPs;International Journal of Parallel Programming;2011-06-01