Affiliation:
1. EPFL, Lausanne, Switzerland
2. University of Edinburgh, Edinburgh, United Kingdom
Abstract
Emerging datacenter applications operate on vast datasets that are kept in DRAM to minimize latency. The large number of servers needed to accommodate this massive memory footprint requires frequent server-to-server communication in applications such as key-value stores and graph-based applications that rely on large irregular data structures. The fine-grained nature of the accesses is a poor match to commodity networking technologies, including RDMA, which incur delays of 10-1000x over local DRAM operations. We introduce Scale-Out NUMA (soNUMA) -- an architecture, programming model, and communication protocol for low-latency, distributed in-memory processing. soNUMA layers an RDMA-inspired programming model directly on top of a NUMA memory fabric via a stateless messaging protocol. To facilitate interactions between the application, OS, and the fabric, soNUMA relies on the remote memory controller -- a new architecturally-exposed hardware block integrated into the node's local coherence hierarchy. Our results based on cycle-accurate full-system simulation show that soNUMA performs remote reads at latencies that are within 4x of local DRAM, can fully utilize the available memory bandwidth, and can issue up to 10M remote memory operations per second per core.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Graphics and Computer-Aided Design,Software
Cited by
12 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. DPU-Direct: Unleashing Remote Accelerators via Enhanced RDMA for Disaggregated Datacenters;IEEE Transactions on Computers;2024-08
2. <monospace>HoPP</monospace>: Hardware-Software Co-Designed Page Prefetching for Disaggregated Memory;2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA);2023-02
3. Disaggregated Memory in the Datacenter: A Survey;IEEE Access;2023
4. Reconsidering OS memory optimizations in the presence of disaggregated memory;Proceedings of the 2022 ACM SIGPLAN International Symposium on Memory Management;2022-06-14
5. Clio: a hardware-software co-designed disaggregated memory system;Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems;2022-02-22