The MIT Alewife machine-Reference-Cited by-同舟云学术

The MIT Alewife machine

Published:1995-05 Issue:2 Volume:23 Page:2-13
ISSN:0163-5964
Container-title:ACM SIGARCH Computer Architecture News
language:en
Short-container-title:SIGARCH Comput. Archit. News

Author:

Agarwal Anant¹,Bianchini Ricardo²,Chaiken David³,Johnson Kirk L.¹,Kranz David¹,Kubiatowicz John¹,Lim Beng-Hong⁴,Mackenzie Kenneth¹,Yeung Donald¹

Affiliation:

1. Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts

2. University of Rochester, Rochester, NY and Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts

3. Digital Equipment Corporation Systems Research, Center, Palo Alto, CA and Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts

4. IBM T.J. Watson Research Center, Yorktown, Heights, NY and Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts

Abstract

Alewife is a multiprocessor architecture that supports up to 512 processing nodes connected over a scalable and cost-effective mesh network at a constant cost per node. The MIT Alewife machine, a prototype implementation of the architecture, demonstrates that a parallel system can be both scalable and programmable. Four mechanisms combine to achieve these goals: software-extended coherent shared memory provides a global, linear address space; integrated message passing allows compiler and operating system designers to provide efficient communication and synchronization; support for fine-grain computation allows many processors to cooperate on small problem sizes; and latency tolerance mechanisms --- including block multithreading and prefetching --- mask unavoidable delays due to communication.Microbenchmarks, together with over a dozen complete applications running on the 32-node prototype, help to analyze the behavior of the system. Analysis shows that integrating message passing with shared memory enables a cost-efficient solution to the cache coherence problem and provides a rich set of programming primitives. Block multithreading and prefetching improve performance by up to 25% individually, and 35% together. Finally, language constructs that allow programmers to express fine-grain synchronization can improve performance by over a factor of two.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/225830.223985

Reference29 articles.

1. Automatic Partitioning of Parallel Loops for Cache-Coherent Multiprocessors

2. Sparcle: an evolutionary processor design for large-scale multiprocessors

3. APRIL

4. Exploiting heterogeneous parallelism on a multithreaded multiprocessor

5. ANSI/IEEE Std 1596-1992 Scalable Coherent Interface 1992.]] ANSI/IEEE Std 1596-1992 Scalable Coherent Interface 1992.]]

Cited by 11 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Logical Memory Pools;Proceedings of the 22nd ACM Workshop on Hot Topics in Networks;2023-11-28

2. DynAMO: Improving Parallelism Through Dynamic Placement of Atomic Memory Operations;Proceedings of the 50th Annual International Symposium on Computer Architecture;2023-06-17

3. A Survey on Trusted Distributed Artificial Intelligence;IEEE Access;2022

4. Fifer: Practical Acceleration of Irregular Applications on Reconfigurable Architectures;MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture;2021-10-17

5. Amdahl's law in the context of heterogeneous many‐core systems – a survey;IET Computers & Digital Techniques;2020-04-03