Affiliation:
1. University of Edinburgh, Edinburgh, UK
Abstract
While counterfactuals have been extensively studied as an intuitive explanation of model predictions, they still have limited adoption in practice due to two obstacles: (a) They rely on excessive access to the model for explanation that the model owner may not provide; and (b) counterfactuals carry information that adversarial users can exploit to launch model extraction attacks. To address the challenges, we propose CPC, a data-driven approach to counterfactual. CPC works at the client side and gives full control and right-to-explain to model users, even when model owners opt not to. Moreover, CPC warrants that adversarial users cannot exploit counterfactuals to extract models. We formulate properties and fundamental problems underlying CPC, study their complexity and develop effective algorithms. Using real-world datasets and user study, we verify that CPC does prevent adversaries from exploiting counterfactuals for model extraction attacks, and is orders of magnitude faster than existing explainers, while maintaining comparable and often higher quality.
Publisher
Association for Computing Machinery (ACM)
Reference123 articles.
1. 2020. Credit dataset. https://github.com/DrIanGregory/Kaggle-GiveMeSomeCredit.
2. 2022. Compas dataset. https://www.kaggle.com/datasets/danofer/compass.
3. 2022. Kaggle. https://www.kaggle.com/.
4. 2022. Loan dataset. https://www.kaggle.com/datasets/vikasukani/loan-eligible-dataset.
5. 2022. UCI Machine Learning Repository. https://archive.ics.uci.edu/ml/index.php.