Cardinality estimation of approximate substring queries using deep learning-Reference-Cited by-同舟云学术

Cardinality estimation of approximate substring queries using deep learning

Published:2022-07 Issue:11 Volume:15 Page:3145-3157
ISSN:2150-8097
Container-title:Proceedings of the VLDB Endowment
language:en
Short-container-title:Proc. VLDB Endow.

Author:

Kwon Suyong¹,Jung Woohwan²,Shim Kyuseok¹

Affiliation:

1. Seoul National University

2. Hanyang University

Abstract

Cardinality estimation of an approximate substring query is an important problem in database systems. Traditional approaches build a summary from the text data and estimate the cardinality using the summary with some statistical assumptions. Since deep learning models can learn underlying complex data patterns effectively, they have been successfully applied and shown to outperform traditional methods for cardinality estimations of queries in database systems. However, since they are not yet applied to approximate substring queries, we investigate a deep learning approach for cardinality estimation of such queries. Although the accuracy of deep learning models tends to improve as the train data size increases, producing a large train data is computationally expensive for cardinality estimation of approximate substring queries. Thus, we develop efficient train data generation algorithms by avoiding unnecessary computations and sharing common computations. We also propose a deep learning model as well as a novel learning method to quickly obtain an accurate deep learning-based estimator. Extensive experiments confirm the superiority of our data generation algorithms and deep learning model with the novel learning method.

Publisher

Association for Computing Machinery (ACM)

Subject

General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development

Link

https://dl.acm.org/doi/pdf/10.14778/3551793.3551859

Reference36 articles.

1. (Accessed June 11 2021). Edit distance. https://en.wikipedia.org/wiki/Edit_distance (Accessed June 11 2021). Edit distance. https://en.wikipedia.org/wiki/Edit_distance

2. Estimating the selectivity of LIKE queries using pattern-based histograms

3. Scaling to very very large corpora for natural language disambiguation

4. Selectivity estimation for string predicates: overcoming the underestimation problem

5. Selectively estimation for Boolean queries

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Learned Query Optimizer: What is New and What is Next;Companion of the 2024 International Conference on Management of Data;2024-06-09

2. LPLM: A Neural Language Model for Cardinality Estimation of LIKE-Queries;Proceedings of the ACM on Management of Data;2024-03-12

3. CAFE: Towards Compact, Adaptive, and Fast Embedding for Large-scale Recommendation Models;Proceedings of the ACM on Management of Data;2024-03-12

4. LAF: A Local Depth Autoregressive Framework for Cardinality Estimation of Multi-attribute Queries;Lecture Notes in Computer Science;2024

5. Experimental Analysis of Large-Scale Learnable Vector Storage Compression;Proceedings of the VLDB Endowment;2023-12