SoftFLASH-Reference-Cited by-同舟云学术

SoftFLASH

Published:1996-12 Issue:5 Volume:30 Page:210-220
ISSN:0163-5980
Container-title:ACM SIGOPS Operating Systems Review
language:en
Short-container-title:SIGOPS Oper. Syst. Rev.

Author:

Erlichson Andrew¹,Nuckolls Neal²,Chesson Greg²,Hennessy John¹

Affiliation:

1. Computer Systems Lab, Stanford University, Stanford, CA

2. Silicon Graphics Inc., 2011 North Shoreline Blvd., Mountain View, CA

Abstract

One potentially attractive way to build large-scale shared-memory machines is to use small-scale to medium-scale shared-memory machines as clusters that are interconnected with an off-the-shelf network. To create a shared-memory programming environment across the clusters, it is possible to use a virtual shared-memory software layer. Because of the low latency and high bandwidth of the interconnect available within each cluster, there are clear advantages in making the clusters as large as possible. The critical question then becomes whether the latency and bandwidth of the top-level network and the software system are sufficient to support the communication demands generated by the clusters.To explore these questions, we have built an aggressive kernel implementation of a virtual shared-memory system using SGI multiprocessors and 100Mbyte/sec HIPPI interconnects. The system obtains speedups on 32 processors (four nodes, eight processors per node plus additional reserved protocol processors) that range from 6.9 on the communication-intensive FFT program to 21.6 on Ocean (both from the SPLASH 2 suite). In general, clustering is effective in reducing internode miss rates, but as the cluster size increases, increases in the remote latency, mostly due to increased TLB synchronization cost, offset the advantages. For communication-intensive applications, such as FFT, the overhead of sending out network requests, the limited network bandwidth, and the long network latency prevent the achievement of good performance. Overall, this approach still appears promising, but our results indicate that large low latency networks may be needed to make cluster-based virtual shared-memory machines broadly useful as large-scale shared-memory multiprocessors.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/248208.237187

Reference26 articles.

1. The MIT Alewife machine

2. Design of the Munin Distributed Shared Memory System

3. Performance evaluation of hybrid hardware and software distributed shared memory protocols

4. The Amber system: parallel programming on a network of multiprocessors

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Scaling out NUMA-Aware Applications with RDMA-Based Distributed Shared Memory;Journal of Computer Science and Technology;2019-01