Affiliation:
1. Shanghai Jiao Tong University, China and Shanghai Artificial Intelligence Laboratory, Shanghai, China
2. Shanghai Jiao Tong University, China and Domain-Specific Operating Systems, Ministry of Education, Shanghai, China
Abstract
RDMA
(
Remote Direct Memory Access
) has gained considerable interests in network-attached in-memory key-value stores. However, traversing the remote tree-based index in ordered key-value stores with RDMA becomes a critical obstacle, causing an order-of-magnitude slowdown and limited scalability due to multiple round trips. Using index cache with conventional wisdom—caching partial data and traversing them locally—usually leads to limited effect because of unavoidable capacity misses, massive random accesses, and costly cache invalidations.
We argue that the
machine learning
(ML) model is a perfect cache structure for the tree-based index, termed
learned cache
. Based on it, we design and implement
XStore
, an RDMA-based ordered key-value store with a new hybrid architecture that retains a tree-based index at the server to perform dynamic workloads (e.g., inserts) and leverages a learned cache at the client to perform static workloads (e.g., gets and scans). The key idea is to decouple ML model retraining from index updating by maintaining a layer of indirection from logical to actual positions of key-value pairs. It allows a stale learned cache to continue predicting a correct position for a lookup key.
XStore
ensures correctness using a validation mechanism with a fallback path and further uses speculative execution to minimize the cost of cache misses. Evaluations with YCSB benchmarks and production workloads show that a single
XStore
server can achieve over 80 million read-only requests per second. This number outperforms state-of-the-art RDMA-based ordered key-value stores (namely, DrTM-Tree, Cell, and eRPC+Masstree) by up to 5.9× (from 3.7×). For workloads with inserts,
XStore
still provides up to 3.5× (from 2.7×) throughput speedup, achieving 53M reqs/s. The learned cache can also reduce client-side memory usage and further provides an efficient memory-performance tradeoff, e.g., saving 99% memory at the cost of 20% peak throughput.
Funder
National Key Research and Development Program of China
National Natural Science Foundation of China
Huawei Technologies
Publisher
Association for Computing Machinery (ACM)
Subject
Hardware and Architecture
Reference63 articles.
1. 2021. Intel’s Math Kernel Library. (2021). https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html. 2021. Intel’s Math Kernel Library. (2021). https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html.
2. Designing Far Memory Data Structures
Cited by
11 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献