Abstract
AbstractDetecting malicious attacks presents a major challenge in the field of reinforcement learning (RL), as such attacks can force the victim to perform abnormal actions, with potentially severe consequences. To mitigate these risks, current research focuses on the enhancement of RL algorithms with efficient detection mechanisms, especially for real-world applications. Adversarial attacks have the potential to alter the environmental dynamics of a Markov Decision Process (MDP) perceived by an RL agent. Leveraging these changes in dynamics, we propose a novel approach to detect attacks. Our contribution can be summarized in two main aspects. Firstly, we propose a novel formalization of the attack detection problem that entails analyzing modifications made by attacks to the transition and reward dynamics within the environment. This problem can be framed as a context change detection problem, where the goal is to identify the transition from a “free-of-attack” situation to an “under-attack” scenario. To solve this problem, we propose a groundbreaking “model-free” clustering-based countermeasure. This approach consists of two essential steps: first, partitioning the transition space into clusters, and then using this partitioning to identify changes in environmental dynamics caused by adversarial attacks. To assess the efficiency of our detection method, we performed experiments on four established RL domains (grid-world, mountain car, carpole, and acrobot) and subjected them to four advanced attack types. Uniform, Strategically-timed, Q-value, and Multi-objective. Our study proves that our technique has a high potential for perturbation detection, even in scenarios where attackers employ more sophisticated strategies.
Funder
Repsol
Ministerio de Economía y Competitividad
JPMorgan Chase and Company
Publisher
Springer Science and Business Media LLC
Reference31 articles.
1. Chen T, Liu J, Xiang Y, Niu W, Tong E, Han Z (2019) Adversarial attack and defense in reinforcement learning-from AI security view. Cybersecur. 2(1):11. https://doi.org/10.1186/s42400-019-0027-x
2. Behzadan V, Munir A (2017) Vulnerability of deep reinforcement learning to policy induction attacks. In: Proceedings of the international conference on machine learning and data mining in pattern recognition. Lecture Notes in Computer Science, vol 10358, pp 262–275. Springer, New York, NY, USA. https://doi.org/10.1007/978-3-319-62416-7_19
3. Sharif M, Bhagavatula S, Bauer L, Reiter MK (2016) Accessorize to a crime: real and stealthy attacks on state-of-the-art face recognition. In: Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, Vienna, Austria, pp 1528–1540. https://doi.org/10.1145/2976749.2978392. Accessed 24–28 Oct 2016
4. Kurakin A, Goodfellow IJ, Bengio S (2017) Adversarial examples in the physical world. In: 5th International conference on learning representations, ICLR 2017, April 24-26, Workshop Track Proceedings. OpenReview.net, Toulon, France. https://openreview.net/forum?id=HJGU3Rodl
5. Deng Y, Zheng X, Zhang T, Chen C, Lou G, Kim M (2020) An analysis of adversarial attacks and defenses on autonomous driving models. In: 2020 IEEE International conference on pervasive computing and communications (PerCom), pp 1–10. https://doi.org/10.1109/PerCom45495.2020.9127389