Manycore network interfaces for in-memory rack-scale computing-Reference-Cited by-同舟云学术

Manycore network interfaces for in-memory rack-scale computing

Published:2016-01-04 Issue:3S Volume:43 Page:567-579
ISSN:0163-5964
Container-title:ACM SIGARCH Computer Architecture News
language:en
Short-container-title:SIGARCH Comput. Archit. News

Author:

Daglis Alexandros¹,Novaković Stanko¹,Bugnion Edouard¹,Falsafi Babak¹,Grot Boris²

Affiliation:

1. EcoCloud, EPFL

2. University of Edinburgh

Abstract

Datacenter operators rely on low-cost, high-density technologies to maximize throughput for data-intensive services with tight tail latencies. In-memory rack-scale computing is emerging as a promising paradigm in scale-out datacenters capitalizing on commodity SoCs, low-latency and high-bandwidth communication fabrics and a remote memory access model to enable aggregation of a rack's memory for critical data-intensive applications such as graph processing or key-value stores. Low latency and high bandwidth not only dictate eliminating communication bottlenecks in the software protocols and off-chip fabrics but also a careful on-chip integration of network interfaces. The latter is a key challenge especially in architectures with RDMA-inspired one-sided operations that aim to achieve low latency and high bandwidth through on-chip Network Interface (NI) support. This paper proposes and evaluates network interface architectures for tiled manycore SoCs for in-memory rack-scale computing. Our results indicate that a careful splitting of NI functionality per chip tile and at the chip's edge along a NOC dimension enables a rack-scale architecture to optimize for both latency and bandwidth. Our best manycore NI architecture achieves latencies within 3% of an idealized hardware NUMA and efficiently uses the full bisection bandwidth of the NOC, without changing the on-chip coherence protocol or the core's microarchitecture.

Funder

Schweizerische Nationalfonds zur Förderung der Wissenschaftlichen Forschung

Microsoft Research

Nano-Tera

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/2872887.2750415

Reference46 articles.

1. Achieving predictable performance through better memory controller placement in many-core CMPs

2. The MIT Alewife machine

3. Anandtech "Haswell: Up to 128MB On-Package Cache." {Online}. Available: http://www.anandtech.com/show/6277/haswell-up-to-128mb-onpackage-cache-ulv-gpu-performance-estimates. Anandtech "Haswell: Up to 128MB On-Package Cache." {Online}. Available: http://www.anandtech.com/show/6277/haswell-up-to-128mb-onpackage-cache-ulv-gpu-performance-estimates.

4. K. Asanović "A Hardware Building Block for 2020 Warehouse-Scale Computers " USENIX FAST Keynote 2014. K. Asanović "A Hardware Building Block for 2020 Warehouse-Scale Computers " USENIX FAST Keynote 2014.

5. Workload analysis of a large-scale key-value store

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An ultra-low latency and compatible PCIe interconnect for rack-scale communication;Proceedings of the 18th International Conference on emerging Networking EXperiments and Technologies;2022-11-30

2. Cerebros: Evading the RPC Tax in Datacenters;MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture;2021-10-17

3. SmartIO;ACM Transactions on Computer Systems;2021-07

4. Flexible device compositions and dynamic resource sharing in PCIe interconnected clusters using Device Lending;Cluster Computing;2019-09-21