XStore : Fast RDMA-Based Ordered Key-Value Store Using Remote Learned Cache-Reference-Cited by-同舟云学术

XStore : Fast RDMA-Based Ordered Key-Value Store Using Remote Learned Cache

Published:2021-08-31 Issue:3 Volume:17 Page:1-32
ISSN:1553-3077
Container-title:ACM Transactions on Storage
language:en
Short-container-title:ACM Trans. Storage

Author:

Wei Xingda¹,Chen Rong¹,Chen Haibo²,Zang Binyu²

Affiliation:

1. Shanghai Jiao Tong University, China and Shanghai Artificial Intelligence Laboratory, Shanghai, China

2. Shanghai Jiao Tong University, China and Domain-Specific Operating Systems, Ministry of Education, Shanghai, China

Abstract

RDMA ( Remote Direct Memory Access ) has gained considerable interests in network-attached in-memory key-value stores. However, traversing the remote tree-based index in ordered key-value stores with RDMA becomes a critical obstacle, causing an order-of-magnitude slowdown and limited scalability due to multiple round trips. Using index cache with conventional wisdom—caching partial data and traversing them locally—usually leads to limited effect because of unavoidable capacity misses, massive random accesses, and costly cache invalidations. We argue that the machine learning (ML) model is a perfect cache structure for the tree-based index, termed learned cache . Based on it, we design and implement XStore , an RDMA-based ordered key-value store with a new hybrid architecture that retains a tree-based index at the server to perform dynamic workloads (e.g., inserts) and leverages a learned cache at the client to perform static workloads (e.g., gets and scans). The key idea is to decouple ML model retraining from index updating by maintaining a layer of indirection from logical to actual positions of key-value pairs. It allows a stale learned cache to continue predicting a correct position for a lookup key. XStore ensures correctness using a validation mechanism with a fallback path and further uses speculative execution to minimize the cost of cache misses. Evaluations with YCSB benchmarks and production workloads show that a single XStore server can achieve over 80 million read-only requests per second. This number outperforms state-of-the-art RDMA-based ordered key-value stores (namely, DrTM-Tree, Cell, and eRPC+Masstree) by up to 5.9× (from 3.7×). For workloads with inserts, XStore still provides up to 3.5× (from 2.7×) throughput speedup, achieving 53M reqs/s. The learned cache can also reduce client-side memory usage and further provides an efficient memory-performance tradeoff, e.g., saving 99% memory at the cost of 20% peak throughput.

Funder

National Key Research and Development Program of China

National Natural Science Foundation of China

Huawei Technologies

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture

Link

https://dl.acm.org/doi/pdf/10.1145/3468520

Reference63 articles.

1. 2021. Intel’s Math Kernel Library. (2021). https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html. 2021. Intel’s Math Kernel Library. (2021). https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html.

2. Designing Far Memory Data Structures

Cited by 11 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. DiStore: A Fully Memory Disaggregation Friendly Key-Value Store with Improved Tail Latency and Space Efficiency;Proceedings of the 53rd International Conference on Parallel Processing;2024-08-12

2. RR-Compound: RDMA-Fused gRPC for Low Latency, High Throughput, and Easy Interface;IEEE Transactions on Parallel and Distributed Systems;2024-08

3. SECM: Securely and efficiently connections setup using RDMA-CM;Computer Networks;2024-08

4. AStore: Uniformed Adaptive Learned Index and Cache for RDMA-Enabled Key-Value Store;IEEE Transactions on Knowledge and Data Engineering;2024-07

5. Can Learned Indexes be Built Efficiently? A Deep Dive into Sampling Trade-offs;Proceedings of the ACM on Management of Data;2024-05-29