Abstract
AbstractProtecting ML classifiers from adversarial examples is crucial. We propose that the main threat is an attacker perturbing a confidently classified input to produce a confident misclassification. We consider in this paper the $$L_0$$
L
0
attack in which a small number of inputs can be perturbed by the attacker at test-time. To quantify the risk of this form of attack we have devised a formal guarantee in the form of an adversarial bound (AB) for a binary, Gaussian process classifier using the EQ kernel. This bound holds for the entire input domain, bounding the potential of any future adversarial attack to cause a confident misclassification. We explore how to extend to other kernels and investigate how to maximise the bound by altering the classifier (for example by using sparse approximations). We test the bound using a variety of datasets and show that it produces relevant and practical bounds for many of them.
Funder
Engineering and Physical Sciences Research Council
Bundesministerium fur Bildung und Forschung
Publisher
Springer Science and Business Media LLC
Subject
Artificial Intelligence,Software
Reference38 articles.
1. Biggio, B., Corona, I., Maiorca, D. et al. (2013). Evasion attacks against machine learning at test time. In: Machine learning and knowledge discovery in databases - european conference, ECML PKDD 2013, Prague, Czech Republic, September 23–27, 2013, Proceedings, Part III, pp. 387–402
2. Blaas, A., Patane, A., Laurenti, L. et al. (2020). Robustness quantification for classification with Gaussian processes. In: 23rd international conference on artificial intelligence and statistics
3. Bojchevski, A., Klicpera, J., Günnemann, S. (2020). Efficient robustness certificates for discrete data: Sparsity-aware randomized smoothing for graphs, images and more. In: International conference on machine learning, PMLR, pp. 1003–1013
4. Cardelli, L., Kwiatkowska, M., Laurenti L. et al. (2019). Robustness guarantees for Bayesian inference with Gaussian processes. In: Proceedings of the AAAI conference on artificial intelligence, pp. 7759–7768
5. Carlini, N., Wagner, D. (2017a). Adversarial examples are not easily detected: Bypassing ten detection methods. In: Proceedings of the 10th ACM workshop on artificial intelligence and security, ACM, pp. 3–14