Toward a better understanding and evaluation of tree structures on flash SSDs-Reference-Cited by-同舟云学术

Toward a better understanding and evaluation of tree structures on flash SSDs

Published:2020-11 Issue:3 Volume:14 Page:364-377
ISSN:2150-8097
Container-title:Proceedings of the VLDB Endowment
language:en
Short-container-title:Proc. VLDB Endow.

Author:

Didona Diego¹,Ioannou Nikolas¹,Stoica Radu¹,Kourtis Kornilios²

Affiliation:

1. IBM Research Zurich, Rüschlikon, Switzerland

2. Cilium, Zurich, Switzerland

Abstract

Solid-state drives (SSDs) are extensively used to deploy persistent data stores, as they provide low latency random access, high write throughput, high data density, and low cost. Tree-based data structures are widely used to build persistent data stores, and indeed they lie at the backbone of many of the data management systems used in production and research today. We show that benchmarking a persistent tree-based data structure on an SSD is a complex process, which may easily incur subtle pitfalls that can lead to an inaccurate performance assessment. At a high-level, these pitfalls stem from the interaction of complex software running on complex hardware. On the one hand, tree structures implement internal operations that have non-trivial effects on performance. On the other hand, SSDs employ firmware logic to deal with the idiosyncrasies of the underlying flash memory, which are well known to also lead to complex performance dynamics. We identify seven benchmarking pitfalls using RocksDB and WiredTiger, two widespread implementations of an LSM-Tree and a B+Tree, respectively. We show that such pitfalls can lead to incorrect measurements of key performance indicators, hinder the reproducibility and the representativeness of the results, and lead to suboptimal deployments in production environments. We also provide guidelines on how to avoid these pitfalls to obtain more reliable performance measurements, and to perform more thorough and fair comparisons among different design points.

Publisher

VLDB Endowment

Subject

General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development

Link

https://dl.acm.org/doi/pdf/10.14778/3430915.3430926

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. DumpyOS: A data-adaptive multi-ary index for scalable data series similarity search;The VLDB Journal;2024-08-21

2. CAVE: Concurrency-Aware Graph Processing on SSDs;Proceedings of the ACM on Management of Data;2024-05-29

3. BFQ, Multiqueue-Deadline, or Kyber? Performance Characterization of Linux Storage Schedulers in the NVMe Era;Proceedings of the 15th ACM/SPEC International Conference on Performance Engineering;2024-05-07

4. A Systematic Configuration Space Exploration of the Linux Kyber I/O Scheduler;Companion of the 15th ACM/SPEC International Conference on Performance Engineering;2024-05-07

5. FlashAlloc: Dedicating Flash Blocks by Objects;Proceedings of the VLDB Endowment;2023-07