PACE: Poisoning Attacks on Learned Cardinality Estimation-Reference-Cited by-同舟云学术

PACE: Poisoning Attacks on Learned Cardinality Estimation

Published:2024-03-12 Issue:1 Volume:2 Page:1-27
ISSN:2836-6573
Container-title:Proceedings of the ACM on Management of Data
language:en
Short-container-title:Proc. ACM Manag. Data

Author:

Zhang Jintao¹^ORCID,Zhang Chao¹^ORCID,Li Guoliang¹^ORCID,Chai Chengliang²^ORCID

Affiliation:

1. Tsinghua University, Beijing, China

2. Beijing Institute of Technology, Beijing, China

Abstract

Cardinality estimation (CE) plays a crucial role in database optimizer. We have witnessed the emergence of numerous learned CE models recently which can outperform traditional methods such as histograms and samplings. However, learned models also bring many security risks. For example, a query-driven learned CE model learns a query-to-cardinality mapping based on the historical workload. Such a learned model could be attacked by poisoning queries, which are crafted by malicious attackers and woven into the historical workload, leading to performance degradation of CE. In this paper, we explore the potential security risks in learned CE and study a new problem of poisoning attacks on learned CE in a black-box setting. There are three challenges. First, the interior details of the CE model are hidden in the black-box setting, making it difficult to attack the model. Second, the attacked CE model's parameters will be updated with the poisoning queries, i.e., a variable varying with the optimization variable, so the problem cannot be modeled as a univariate optimization problem and thus is hard to solve by an efficient algorithm. Third, to make an imperceptible attack, it requires to generate poisoning queries that follow a similar distribution to historical workload. We propose a poisoning attack system, PACE, to address these challenges. To tackle the first challenge, we propose a method of speculating and training a surrogate model, which transforms the black-box attack into a near-white-box attack. To address the second challenge, we model the poisoning problem as a bivariate optimization problem, and design an effective and efficient algorithm to solve it. To overcome the third challenge, we propose an adversarial approach to train a poisoning query generator alongside an anomaly detector, ensuring that the poisoning queries follow similar distribution to historical workload. Experiments show that PACE reduces the accuracy of the learned CE models by 178×, leading to a 10× decrease in the end-to-end performance of the target database.

Funder

NSF of China

CCF-Huawei Populus Grove Challenge Fund

Science and Technology Research and Development Plan of China Railway

National Key R&D Program of China

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3639292

Reference60 articles.

1. Variational autoencoder based anomaly detection using reconstruction probability;An Jinwon;Special Lecture on IE,2015

2. Larry Armijo. 1966. Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of mathematics, Vol. 16, 1 (1966), 1--3.

3. The security of machine learning

4. Peter J Bickel and Kjell A Doksum. 2015. Mathematical statistics: basic ideas and selected topics, volumes I-II package. Chapman and Hall/CRC.