Manycore network interfaces for in-memory rack-scale computing

Author:

Daglis Alexandros1,Novaković Stanko1,Bugnion Edouard1,Falsafi Babak1,Grot Boris2

Affiliation:

1. EcoCloud, EPFL

2. University of Edinburgh

Abstract

Datacenter operators rely on low-cost, high-density technologies to maximize throughput for data-intensive services with tight tail latencies. In-memory rack-scale computing is emerging as a promising paradigm in scale-out datacenters capitalizing on commodity SoCs, low-latency and high-bandwidth communication fabrics and a remote memory access model to enable aggregation of a rack's memory for critical data-intensive applications such as graph processing or key-value stores. Low latency and high bandwidth not only dictate eliminating communication bottlenecks in the software protocols and off-chip fabrics but also a careful on-chip integration of network interfaces. The latter is a key challenge especially in architectures with RDMA-inspired one-sided operations that aim to achieve low latency and high bandwidth through on-chip Network Interface (NI) support. This paper proposes and evaluates network interface architectures for tiled manycore SoCs for in-memory rack-scale computing. Our results indicate that a careful splitting of NI functionality per chip tile and at the chip's edge along a NOC dimension enables a rack-scale architecture to optimize for both latency and bandwidth. Our best manycore NI architecture achieves latencies within 3% of an idealized hardware NUMA and efficiently uses the full bisection bandwidth of the NOC, without changing the on-chip coherence protocol or the core's microarchitecture.

Funder

Schweizerische Nationalfonds zur Förderung der Wissenschaftlichen Forschung

Microsoft Research

Nano-Tera

Publisher

Association for Computing Machinery (ACM)

Reference46 articles.

1. Achieving predictable performance through better memory controller placement in many-core CMPs

2. The MIT Alewife machine

3. Anandtech "Haswell: Up to 128MB On-Package Cache." {Online}. Available: http://www.anandtech.com/show/6277/haswell-up-to-128mb-onpackage-cache-ulv-gpu-performance-estimates. Anandtech "Haswell: Up to 128MB On-Package Cache." {Online}. Available: http://www.anandtech.com/show/6277/haswell-up-to-128mb-onpackage-cache-ulv-gpu-performance-estimates.

4. K. Asanović "A Hardware Building Block for 2020 Warehouse-Scale Computers " USENIX FAST Keynote 2014. K. Asanović "A Hardware Building Block for 2020 Warehouse-Scale Computers " USENIX FAST Keynote 2014.

5. Workload analysis of a large-scale key-value store

Cited by 4 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. An ultra-low latency and compatible PCIe interconnect for rack-scale communication;Proceedings of the 18th International Conference on emerging Networking EXperiments and Technologies;2022-11-30

2. Cerebros: Evading the RPC Tax in Datacenters;MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture;2021-10-17

3. SmartIO;ACM Transactions on Computer Systems;2021-07

4. Flexible device compositions and dynamic resource sharing in PCIe interconnected clusters using Device Lending;Cluster Computing;2019-09-21

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3