Affiliation:
1. School of Informatics, Xiamen University, Xiamen, China
2. National University of Defense Technology, Changsha, China
3. The School of Computer Science, Peking University, Beijing, China
4. OceanBase, Ant Group, Hangzhou, China
5. Mohamed bin Zayed University of Artificial Intelligence, Masdar City, United Arab Emirates
Abstract
As the data volume continues to grow exponentially, there is an increasing demand for large storage system capacity. Data compression techniques effectively reduce the volume of written data, enhancing space efficiency. As a result, many modern SSDs have already incorporated data compression capabilities. However, data compression introduces additional processing overhead in critical I/O paths, potentially affecting system performance. Currently, most compression solutions in flash-based storage systems employ fixed compression algorithms for all incoming data without leveraging differences among various data access patterns. This leads to sub-optimal compression efficiency.
This article proposes a data-type-aware Flash Translation Layer (DAFTL) scheme to maximize space efficiency without compromising system performance. First, we propose an I/O behavior prediction method to forecast future access on specific data. Then, DAFTL matches data types with distinct I/O behaviors to compression algorithms of varying intensities, achieving an optimal balance between performance and space efficiency. Specifically, it employs higher-intensity compression algorithms for less frequently accessed data to maximize space efficiency. For frequently accessed data, it utilizes lower-intensity but faster compression algorithms to maintain system performance. Finally, an improved compact compression method is proposed to effectively eliminate page fragmentation and further enhance space efficiency. Extensive evaluations using a variety of real-world workloads, as well as the workloads with real data we collected on our platforms, demonstrate that DAFTL achieves more data reductions than other approaches. When compared to the state-of-the-art compression schemes, DAFTL reduces the total number of pages written to the SSD by an average of 8%, 21.3%, and 25.6% for data with high, medium, and low compressibility, respectively. In the case of workloads with real data, DAFTL achieves an average reduction of 10.4% in the total number of pages written to SSD. Furthermore, DAFTL exhibits comparable or even improved read and write performance compared to other solutions.
Funder
National Key R&D Program of China
Natural Science Foundation of Xiamen
National Natural Science Foundation of China
China Fundamental Research Funds
Central Universities
Ant Group through CCF-Ant Research Fund
Publisher
Association for Computing Machinery (ACM)
Reference64 articles.
1. Przemyslaw Skibinski Jinfei Han Dmitry Atamanov Andrea Bocci and Chip Turner. 2015. Lzbench. Retrieved 30 June 2024 from https://github.com/inikep/lzbench
2. Milosz Krajewski. 2015. Silesia Compression Corpus. Retrieved 30 June 2024 from https://github.com/MiloszKrajewski/SilesiaCorpus
3. Facebook. 2016. ZSTD. Retrieved 30 June 2024 from https://github.com/facebook/zstd
4. Chao Shi and Qiuping Wang. 2018. Alibaba Block Traces. Retrieved 30 June 2024 from https://github.com/alibaba/block-traces
5. Ohad Rodeh Josef Bacik and Chris Mason. 2013. BTRFS: The linux B-tree filesystem. ACM Transactions on Storage (TOS) 9 3 (2013) 1–32.