An Enterprise-Grade Open-Source Data Reduction Architecture for All-Flash Storage Systems

Author:

Ajdari Mohammadamin1,Raaf Patrick2,Kishani Mostafa1,Salkhordeh Reza2,Asadi Hossein1,Brinkmann André2

Affiliation:

1. Sharif University of Technology, Tehran, Iran

2. Johannes Gutenberg University, Mainz, Germany

Abstract

All-flash storage (AFS) systems have become an essential infrastructure component to support enterprise applications, where sub-millisecond latency and very high throughput are required. Nevertheless, the price per capacity ofsolid-state drives (SSDs) is relatively high, which has encouraged system architects to adoptdata reduction techniques, mainlydeduplication andcompression, in enterprise storage solutions. To provide higher reliability and performance, SSDs are typically grouped usingredundant array of independent disk (RAID) configurations. Data reduction on top of RAID arrays, however, adds I/O overheads and also complicates the I/O patterns redirected to the underlying backend SSDs, which invalidates the best-practice configurations used in AFS. Unfortunately, existing works on the performance of data reduction do not consider its interaction and I/O overheads with other enterprise storage components including SSD arrays and RAID controllers. In this paper, using a real setup with enterprise-grade components and based on the open-source data reduction module RedHat VDO, we reveal novel observations on the performance gap between the state-of-the-art and the optimal all-flash storage stack with integrated data reduction. We therefore explore the I/O patterns at the storage entry point and compare them with those at the disk subsystem. Our analysis shows a significant amount of I/O overheads for guaranteeing consistency and avoiding data loss through data journaling, frequent small-sized metadata updates, and duplicate content verification. We accompany these observations with cross-layer optimizations to enhance the performance of AFS, which range from deriving new optimal hardware RAID configurations up to introducing changes to the enterprise storage stack. By analyzing the characteristics of I/O types and their overheads, we propose three techniques: (a) application-aware lazy persistence, (b) a fast, read-only I/O cache for duplicate verification, and (c) disaggregation of block maps and data by offloading block maps to a very fast persistent memory device. By consolidating all proposed optimizations and implementing them in an enterprise AFS, we show 1.3× to 12.5× speedup over the baseline AFS with 90% data reduction, and from 7.8× up to 57× performance/cost improvement over an optimized AFS (with no data reduction) running applications ranging from 100% read-only to 100% write-only accesses.

Funder

The European High-Performance Computing Joint Undertaking (JU) and the German Ministry of Education and Research

HPDS Corp.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture,Safety, Risk, Reliability and Quality,Computer Science (miscellaneous)

Reference76 articles.

1. 014)]% AbdelfattahHS14, Mohamed S. Abdelfattah , Andrei Hagiescu , and Deshanand P. Singh . 2014. Gzip on a chip: high performance lossless data compression on FPGAs using OpenCL . In Proceedings of the International Workshop on OpenCL, IWOCL 2013 & 2014 , May 13 --14 , 2013 , Georgia Tech, Atlanta, GA, USA / Bristol, UK, May 12--13, 2014 . ACM, 4:1--4:9. https://doi.org/10.1145/2664666.2664670 014)]% AbdelfattahHS14, Mohamed S. Abdelfattah, Andrei Hagiescu, and Deshanand P. Singh. 2014. Gzip on a chip: high performance lossless data compression on FPGAs using OpenCL. In Proceedings of the International Workshop on OpenCL, IWOCL 2013 & 2014, May 13--14, 2013, Georgia Tech, Atlanta, GA, USA / Bristol, UK, May 12--13, 2014 . ACM, 4:1--4:9. https://doi.org/10.1145/2664666.2664670

2. ECI-Cache

3. ETICA: Efficient Two-Level I/O Caching Architecture for Virtualized Platforms

4. FIDR

5. 019b)]% AjdariPKKK19, Mohammadamin Ajdari , Pyeongsu Park , Joonsung Kim , Dongup Kwon , and Jangwoo Kim . 2019 b . CIDR: A Cost-Effective In-Line Data Reduction System for Terabit-Per-Second Scale SSD Arrays. In 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019 , Washington, DC, USA, February 16--20 , 2019. IEEE, 28--41. https://doi.org/10.1109/HPCA.2019.00025 019b)]% AjdariPKKK19, Mohammadamin Ajdari, Pyeongsu Park, Joonsung Kim, Dongup Kwon, and Jangwoo Kim. 2019 b. CIDR: A Cost-Effective In-Line Data Reduction System for Terabit-Per-Second Scale SSD Arrays. In 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019, Washington, DC, USA, February 16--20, 2019. IEEE, 28--41. https://doi.org/10.1109/HPCA.2019.00025

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. From SSDs Back to HDDs: Optimizing VDO to Support Inline Deduplication and Compression for HDDs as Primary Storage Media;ACM Transactions on Storage;2024-08-06

2. IO-SEA: Storage I/O and Data Management for Exascale Architectures;Proceedings of the 21st ACM International Conference on Computing Frontiers: Workshops and Special Sessions;2024-05-07

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3