An Enterprise-Grade Open-Source Data Reduction Architecture for All-Flash Storage Systems-Reference-Cited by-同舟云学术

An Enterprise-Grade Open-Source Data Reduction Architecture for All-Flash Storage Systems

Published:2022-05-26 Issue:2 Volume:6 Page:1-27
ISSN:2476-1249
Container-title:Proceedings of the ACM on Measurement and Analysis of Computing Systems
language:en
Short-container-title:Proc. ACM Meas. Anal. Comput. Syst.

Author:

Ajdari Mohammadamin¹,Raaf Patrick²,Kishani Mostafa¹,Salkhordeh Reza²,Asadi Hossein¹,Brinkmann André²

Affiliation:

1. Sharif University of Technology, Tehran, Iran

2. Johannes Gutenberg University, Mainz, Germany

Abstract

All-flash storage (AFS) systems have become an essential infrastructure component to support enterprise applications, where sub-millisecond latency and very high throughput are required. Nevertheless, the price per capacity ofsolid-state drives (SSDs) is relatively high, which has encouraged system architects to adoptdata reduction techniques, mainlydeduplication andcompression, in enterprise storage solutions. To provide higher reliability and performance, SSDs are typically grouped usingredundant array of independent disk (RAID) configurations. Data reduction on top of RAID arrays, however, adds I/O overheads and also complicates the I/O patterns redirected to the underlying backend SSDs, which invalidates the best-practice configurations used in AFS. Unfortunately, existing works on the performance of data reduction do not consider its interaction and I/O overheads with other enterprise storage components including SSD arrays and RAID controllers. In this paper, using a real setup with enterprise-grade components and based on the open-source data reduction module RedHat VDO, we reveal novel observations on the performance gap between the state-of-the-art and the optimal all-flash storage stack with integrated data reduction. We therefore explore the I/O patterns at the storage entry point and compare them with those at the disk subsystem. Our analysis shows a significant amount of I/O overheads for guaranteeing consistency and avoiding data loss through data journaling, frequent small-sized metadata updates, and duplicate content verification. We accompany these observations with cross-layer optimizations to enhance the performance of AFS, which range from deriving new optimal hardware RAID configurations up to introducing changes to the enterprise storage stack. By analyzing the characteristics of I/O types and their overheads, we propose three techniques: (a) application-aware lazy persistence, (b) a fast, read-only I/O cache for duplicate verification, and (c) disaggregation of block maps and data by offloading block maps to a very fast persistent memory device. By consolidating all proposed optimizations and implementing them in an enterprise AFS, we show 1.3× to 12.5× speedup over the baseline AFS with 90% data reduction, and from 7.8× up to 57× performance/cost improvement over an optimized AFS (with no data reduction) running applications ranging from 100% read-only to 100% write-only accesses.

Funder

The European High-Performance Computing Joint Undertaking (JU) and the German Ministry of Education and Research

HPDS Corp.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture,Safety, Risk, Reliability and Quality,Computer Science (miscellaneous)

Link

https://dl.acm.org/doi/pdf/10.1145/3530896

Reference76 articles.

1. 014)]% AbdelfattahHS14, Mohamed S. Abdelfattah , Andrei Hagiescu , and Deshanand P. Singh . 2014. Gzip on a chip: high performance lossless data compression on FPGAs using OpenCL . In Proceedings of the International Workshop on OpenCL, IWOCL 2013 & 2014 , May 13 --14 , 2013 , Georgia Tech, Atlanta, GA, USA / Bristol, UK, May 12--13, 2014 . ACM, 4:1--4:9. https://doi.org/10.1145/2664666.2664670 014)]% AbdelfattahHS14, Mohamed S. Abdelfattah, Andrei Hagiescu, and Deshanand P. Singh. 2014. Gzip on a chip: high performance lossless data compression on FPGAs using OpenCL. In Proceedings of the International Workshop on OpenCL, IWOCL 2013 & 2014, May 13--14, 2013, Georgia Tech, Atlanta, GA, USA / Bristol, UK, May 12--13, 2014 . ACM, 4:1--4:9. https://doi.org/10.1145/2664666.2664670

2. ECI-Cache

3. ETICA: Efficient Two-Level I/O Caching Architecture for Virtualized Platforms

4. FIDR

5. 019b)]% AjdariPKKK19, Mohammadamin Ajdari , Pyeongsu Park , Joonsung Kim , Dongup Kwon , and Jangwoo Kim . 2019 b . CIDR: A Cost-Effective In-Line Data Reduction System for Terabit-Per-Second Scale SSD Arrays. In 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019 , Washington, DC, USA, February 16--20 , 2019. IEEE, 28--41. https://doi.org/10.1109/HPCA.2019.00025 019b)]% AjdariPKKK19, Mohammadamin Ajdari, Pyeongsu Park, Joonsung Kim, Dongup Kwon, and Jangwoo Kim. 2019 b. CIDR: A Cost-Effective In-Line Data Reduction System for Terabit-Per-Second Scale SSD Arrays. In 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019, Washington, DC, USA, February 16--20, 2019. IEEE, 28--41. https://doi.org/10.1109/HPCA.2019.00025

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. From SSDs Back to HDDs: Optimizing VDO to Support Inline Deduplication and Compression for HDDs as Primary Storage Media;ACM Transactions on Storage;2024-08-06

2. IO-SEA: Storage I/O and Data Management for Exascale Architectures;Proceedings of the 21st ACM International Conference on Computing Frontiers: Workshops and Special Sessions;2024-05-07