Abstract
AbstractString dictionaries constitute a large portion of the memory footprint of database applications. While strong string dictionary compression algorithms exist, these come with impractical access and compression times. Therefore, lightweight algorithms such as front coding (PFC) are favored in practice. This paper endeavors to make strong string dictionary compression practical. We focus on Re-Pair Front Coding (RPFC), a grammar-based compression algorithm, since it consistently offers better compression ratios than other algorithms in the literature. To accelerate compression times, we propose block-based RPFC (BRPFC) which consists in independently compressing small blocks of the dictionary. For further accelerated compression times especially on large string dictionaries, we also propose an alternative version of BRPFC that uses sampling to speed up compression. Moreover, to accelerate access times, we devise a vectorized access method, using $$\hbox {Intel}^{\circledR }$$
Intel
®
Advanced Vector Extensions 512 ($$\hbox {Intel}^{\circledR }$$
Intel
®
AVX-512). Our experimental evaluation shows that sampled BRPFC offers compression times up to 190 $$\times $$
×
faster than RPFC, and random string lookups 2.3 $$\times $$
×
faster than RPFC on average. These results move our modified RPFC into a practical range for use in database systems because the overhead of Re-Pair-based compression for access times can be reduced by 2 $$\times $$
×
.
Funder
Technische Universität Ilmenau
Publisher
Springer Science and Business Media LLC
Subject
Hardware and Architecture,Information Systems
Reference35 articles.
1. Abadi, D., Madden, S., Ferreira, M.: Integrating compression and execution in column-oriented database systems. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 671–682 (2006)
2. Lemke C., Sattler KU., Faerber F., Zeier A.: Speeding Up Queries in Column Stores. In: Bach Pedersen T., Mohania M.K., Tjoa A.M. (eds.) Data Warehousing and Knowledge Discovery. DaWaK 2010. Lecture Notes in Computer Science, vol 6263, pp 117–129. Springer, Berlin, Heidelberg (2010)
3. Westmann, T., Kossmann, D., Helmer, S., Moerkotte, G.: The implementation and performance of compressed databases. ACM Sigmod Rec. 29(3), 55–67 (2000)
4. Färber, F., May, N., Lehner, W., Große, P., Müller, I., Rauhe, H., Dees, J.: The SAP HANA database—an architecture overview. IEEE Data Eng. Bull. 35(1), 28–33 (2012)
5. Arz, J., Fischer, J.: LZ-compressed string dictionaries. In: 2014 Data Compression Conference, pp. 322–331 (2014)
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献