Affiliation:
1. Tsinghua University, Beijing, China
2. Beijing Institute of Technology, Beijing, China
Abstract
Cardinality estimation (CE) plays a crucial role in database optimizer. We have witnessed the emergence of numerous learned CE models recently which can outperform traditional methods such as histograms and samplings. However, learned models also bring many security risks. For example, a query-driven learned CE model learns a query-to-cardinality mapping based on the historical workload. Such a learned model could be attacked by poisoning queries, which are crafted by malicious attackers and woven into the historical workload, leading to performance degradation of CE.
In this paper, we explore the potential security risks in learned CE and study a new problem of poisoning attacks on learned CE in a black-box setting. There are three challenges. First, the interior details of the CE model are hidden in the black-box setting, making it difficult to attack the model. Second, the attacked CE model's parameters will be updated with the poisoning queries, i.e., a variable varying with the optimization variable, so the problem cannot be modeled as a univariate optimization problem and thus is hard to solve by an efficient algorithm. Third, to make an imperceptible attack, it requires to generate poisoning queries that follow a similar distribution to historical workload. We propose a poisoning attack system, PACE, to address these challenges. To tackle the first challenge, we propose a method of speculating and training a surrogate model, which transforms the black-box attack into a near-white-box attack. To address the second challenge, we model the poisoning problem as a bivariate optimization problem, and design an effective and efficient algorithm to solve it. To overcome the third challenge, we propose an adversarial approach to train a poisoning query generator alongside an anomaly detector, ensuring that the poisoning queries follow similar distribution to historical workload. Experiments show that PACE reduces the accuracy of the learned CE models by 178×, leading to a 10× decrease in the end-to-end performance of the target database.
Funder
NSF of China
CCF-Huawei Populus Grove Challenge Fund
Science and Technology Research and Development Plan of China Railway
National Key R&D Program of China
Publisher
Association for Computing Machinery (ACM)
Reference60 articles.
1. Variational autoencoder based anomaly detection using reconstruction probability;An Jinwon;Special Lecture on IE,2015
2. Larry Armijo. 1966. Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of mathematics, Vol. 16, 1 (1966), 1--3.
3. The security of machine learning
4. Peter J Bickel and Kjell A Doksum. 2015. Mathematical statistics: basic ideas and selected topics, volumes I-II package. Chapman and Hall/CRC.