A Pilot Study of Observation Poisoning on Selective Reincarnation in Multi-Agent Reinforcement Learning-Reference-Cited by-同舟云学术

A Pilot Study of Observation Poisoning on Selective Reincarnation in Multi-Agent Reinforcement Learning

Published:2024-05-02 Issue:3 Volume:56 Page:
ISSN:1573-773X
Container-title:Neural Processing Letters
language:en
Short-container-title:Neural Process Lett

Author:

Putla Harsha,Patibandla Chanakya,Singh Krishna Pratap,Nagabhushan P

Abstract

AbstractThis research explores the vulnerability of selective reincarnation, a concept in Multi-Agent Reinforcement Learning (MARL), in response to observation poisoning attacks. Observation poisoning is an adversarial strategy that subtly manipulates an agent’s observation space, potentially leading to a misdirection in its learning process. The primary aim of this paper is to systematically evaluate the robustness of selective reincarnation in MARL systems against the subtle yet potentially debilitating effects of observation poisoning attacks. Through assessing how manipulated observation data influences MARL agents, we seek to highlight potential vulnerabilities and inform the development of more resilient MARL systems. Our experimental testbed was the widely used HalfCheetah environment, utilizing the Independent Deep Deterministic Policy Gradient algorithm within a cooperative MARL setting. We introduced a series of triggers, namely Gaussian noise addition, observation reversal, random shuffling, and scaling, into the teacher dataset of the MARL system provided to the reincarnating agents of HalfCheetah. Here, the “teacher dataset” refers to the stored experiences from previous training sessions used to accelerate the learning of reincarnating agents in MARL. This approach enabled the observation of these triggers’ significant impact on reincarnation decisions. Specifically, the reversal technique showed the most pronounced negative effect for maximum returns, with an average decrease of 38.08% in Kendall’s tau values across all the agent combinations. With random shuffling, Kendall’s tau values decreased by 17.66%. On the other hand, noise addition and scaling aligned with the original ranking by only 21.42% and 32.66%, respectively. The results, quantified by Kendall’s tau metric, indicate the fragility of the selective reincarnation process under adversarial observation poisoning. Our findings also reveal that vulnerability to observation poisoning varies significantly among different agent combinations, with some exhibiting markedly higher susceptibility than others. This investigation elucidates our understanding of selective reincarnation’s robustness against observation poisoning attacks, which is crucial for developing more secure MARL systems and also for making informed decisions about agent reincarnation.

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s11063-024-11625-w.pdf

Reference64 articles.

1. Abdallah S, Lesser V (2008) A multiagent reinforcement learning algorithm with non-linear dynamics. J Artif Intell Res 33:521–549

2. Witt CSD, Peng B, Kamienny P-A, Torr PHS, Böhmer W, Whiteson S (2020) Deep multi-agent reinforcement learning for decentralized continuous cooperative control. arXiv:2003.06709

3. Kim DK, Liu M, Riemer MD, Sun C, Abdulhai M, Habibi G, Lopez-Cot S, Tesauro G, How J (2021) A policy gradient algorithm for learning to learn in multiagent reinforcement learning. In: International Conference on Machine Learning, pp. 5541–5550 . PMLR

4. Hu J, Wellman MP (1998) Multiagent reinforcement learning: theoretical framework and an algorithm. ICML 98:242–250

5. Bazzan AL (2009) Opportunities for multiagent systems and multiagent reinforcement learning in traffic control. Auton Agents Multi-Agent Syst 18:342–375