Affiliation:
1. EcoCloud, EPFL
2. University of Edinburgh
Abstract
Datacenter operators rely on low-cost, high-density technologies to maximize throughput for data-intensive services with tight tail latencies. In-memory rack-scale computing is emerging as a promising paradigm in scale-out datacenters capitalizing on commodity SoCs, low-latency and high-bandwidth communication fabrics and a remote memory access model to enable aggregation of a rack's memory for critical data-intensive applications such as graph processing or key-value stores. Low latency and high bandwidth not only dictate eliminating communication bottlenecks in the software protocols and off-chip fabrics but also a careful on-chip integration of network interfaces. The latter is a key challenge especially in architectures with RDMA-inspired one-sided operations that aim to achieve low latency and high bandwidth through on-chip Network Interface (NI) support. This paper proposes and evaluates network interface architectures for tiled manycore SoCs for in-memory rack-scale computing. Our results indicate that a careful splitting of NI functionality per chip tile and at the chip's edge along a NOC dimension enables a rack-scale architecture to optimize for both latency and bandwidth. Our best manycore NI architecture achieves latencies within 3% of an idealized hardware NUMA and efficiently uses the full bisection bandwidth of the NOC, without changing the on-chip coherence protocol or the core's microarchitecture.
Funder
Schweizerische Nationalfonds zur Förderung der Wissenschaftlichen Forschung
Microsoft Research
Nano-Tera
Publisher
Association for Computing Machinery (ACM)
Reference46 articles.
1. Achieving predictable performance through better memory controller placement in many-core CMPs
2. The MIT Alewife machine
3. Anandtech "Haswell: Up to 128MB On-Package Cache." {Online}. Available: http://www.anandtech.com/show/6277/haswell-up-to-128mb-onpackage-cache-ulv-gpu-performance-estimates. Anandtech "Haswell: Up to 128MB On-Package Cache." {Online}. Available: http://www.anandtech.com/show/6277/haswell-up-to-128mb-onpackage-cache-ulv-gpu-performance-estimates.
4. K. Asanović "A Hardware Building Block for 2020 Warehouse-Scale Computers " USENIX FAST Keynote 2014. K. Asanović "A Hardware Building Block for 2020 Warehouse-Scale Computers " USENIX FAST Keynote 2014.
5. Workload analysis of a large-scale key-value store
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. An ultra-low latency and compatible PCIe interconnect for rack-scale communication;Proceedings of the 18th International Conference on emerging Networking EXperiments and Technologies;2022-11-30
2. Cerebros: Evading the RPC Tax in Datacenters;MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture;2021-10-17
3. SmartIO;ACM Transactions on Computer Systems;2021-07
4. Flexible device compositions and dynamic resource sharing in PCIe interconnected clusters using Device Lending;Cluster Computing;2019-09-21