Can Learned Models Replace Hash Functions?-Reference-Cited by-同舟云学术

Can Learned Models Replace Hash Functions?

Published:2022-11 Issue:3 Volume:16 Page:532-545
ISSN:2150-8097
Container-title:Proceedings of the VLDB Endowment
language:en
Short-container-title:Proc. VLDB Endow.

Author:

Sabek Ibrahim¹,Vaidya Kapil¹,Horn Dominik²,Kipf Andreas¹,Mitzenmacher Michael³,Kraska Tim¹

Affiliation:

1. MIT CSAIL

2. TUM

3. Harvard University

Abstract

Hashing is a fundamental operation in database management, playing a key role in the implementation of numerous core database data structures and algorithms. Traditional hash functions aim to mimic a function that maps a key to a random value, which can result in collisions, where multiple keys are mapped to the same value. There are many well-known schemes like chaining, probing, and cuckoo hashing to handle collisions. In this work, we aim to study if using learned models instead of traditional hash functions can reduce collisions and whether such a reduction translates to improved performance, particularly for indexing and joins. We show that learned models reduce collisions in some cases, which depend on how the data is distributed. To evaluate the effectiveness of learned models as hash function, we test them with bucket chaining, linear probing, and cuckoo hash tables. We find that learned models can (1) yield a 1.4x lower probe latency, and (2) reduce the non-partitioned hash join runtime with 28% over the next best baseline for certain datasets. On the other hand, if the data distribution is not suitable, we either do not see gains or see worse performance. In summary, we find that learned models can indeed outperform hash functions, but only for certain data distributions.

Publisher

Association for Computing Machinery (ACM)

Subject

General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development

Link

https://dl.acm.org/doi/pdf/10.14778/3570690.3570702

Reference81 articles.

1. Mohammad Alahmad and Imad Fakhri Taha Alshaikhli . Broad View of Cryptographic Hash Functions . International Journal of Computer Science Issues , 2013 . Mohammad Alahmad and Imad Fakhri Taha Alshaikhli. Broad View of Cryptographic Hash Functions. International Journal of Computer Science Issues, 2013.

2. A comparison of adaptive radix trees and hash tables

3. Austin Appleby. Murmurhash3 64-bit finalizer. https://code.google.com/p/smhasher/wiki/MurmurHash3. Austin Appleby. Murmurhash3 64-bit finalizer. https://code.google.com/p/smhasher/wiki/MurmurHash3.

4. Austin Appleby . MurmurHash. https://sites.google.com/site/murmurhash/ , 2011 . Austin Appleby. MurmurHash. https://sites.google.com/site/murmurhash/, 2011.

5. Berk Atikoglu , Yuehai Xu , Eitan Frachtenberg , Song Jiang , and Mike Paleczny . Workload Analysis of a Large-Scale Key-Value Store . In Proceedings of the ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems , 2012 . Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. Workload Analysis of a Large-Scale Key-Value Store. In Proceedings of the ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems, 2012.

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. FairHash: A Fair and Memory/Time-efficient Hashmap;Proceedings of the ACM on Management of Data;2024-05-29

2. GLO: Towards Generalized Learned Query Optimization;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13

3. DeepMapping: Learned Data Mapping for Lossless Compression and Efficient Lookup;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13

4. Data Structures for Data-Intensive Applications: Tradeoffs and Design Guidelines;Foundations and Trends® in Databases;2023