Maps-Reference-Cited by-同舟云学术

Maps

Published:1999-05 Issue:2 Volume:27 Page:4-15
ISSN:0163-5964
Container-title:ACM SIGARCH Computer Architecture News
language:en
Short-container-title:SIGARCH Comput. Archit. News

Author:

Barua Rajeev¹,Lee Walter¹,Amarasinghe Saman¹,Agarwal Anant¹

Affiliation:

1. M.I.T. Laboratory for Computer Science, Cambridge, MA

Abstract

This paper describes Maps, a compiler managed memory system for Raw architectures. Traditional processors for sequential programs maintain the abstraction of a unified memory by using a single centralized memory system. This implementation leads to the infamous "Von Neumann bottleneck," with machine performance limited by the large memory latency and limited memory bandwidth. A Raw architecture addresses this problem by taking advantage of the rapidly increasing transistor budget to move much of its memory on chip. To remove the bottleneck and complexity associated with centralized memory, Raw distributes the memory with its processing elements. Unified memory semantics are implemented jointly by the hardware and the compiler. The hardware provides a clean compiler interface to its two inter-tile interconnects: a fast, statically schedulable network and a traditional dynamic network. Maps then uses these communication mechanisms to orchestrate the memory accesses for low latency and parallelism while enforcing proper dependence. It optimizes for speed in two ways: by finding accesses that can be scheduled on the static interconnect through static promotion, and by minimizing dependence sequentialization for the remaining accesses. Static promotion is performed using equivalence class unification and modulo unrolling; memory dependences are enforced through explicit synchronization and software serial ordering. We have implemented Maps based on the SUIF infrastructure. This paper demonstrates that the exclusive use of static promotion yields roughly 20-fold speedup on 32 tiles for our regular applications and about 5-fold speedup on 16 or more tiles for our irregular applications. The paper also shows that selective use of dynamic accesses can be a useful complement to the mostly static memory system.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/307338.300980

Reference16 articles.

1. Memory bandwidth limitations of future microprocessors

2. Communication Optimizations for Irregular Scientific Computations on Distributed Memory Architectures

3. An integrated compile-time/run-time software distributed shared memory system

4. Very Long Instruction Word architectures and the ELI-512

Cited by 18 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Runtime-Guided Management of Scratchpad Memories in Multicore Architectures;2015 International Conference on Parallel Architecture and Compilation (PACT);2015-10

2. Affine Loop Optimization Based on Modulo Unrolling in Chapel;Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models - PGAS '14;2014

3. Reconfigurable Systems;Adaptable Embedded Systems;2012-10-20

4. Deployment of Reconfigurable Systems;Dynamic Reconfigurable Architectures and Transparent Optimization Techniques;2010

5. Pipelining saturated accumulation;IEEE Transactions on Computers;2009-02