Author:
Wang Yu,Fu Ke,Chen Hao,Liu Quan,Huang Jian,Zhang Zhongjie
Abstract
In multi-agent domains, dealing with non-stationary opponents that change behaviors (policies) consistently over time is still a challenging problem, where an agent usually requires the ability to detect the opponent’s policy accurately and adopt the optimal response policy accordingly. Previous works commonly assume that the opponent’s observations and actions during online interactions are known, which can significantly limit their applications, especially in partially observable environments. This paper focuses on efficient policy detecting and reusing techniques against non-stationary opponents without their local information. We propose an algorithm called Bayesian policy reuse with LocAl oBservations (Bayes-Lab) by incorporating variational autoencoders (VAE) into the Bayesian policy reuse (BPR) framework. Following the centralized training with decentralized execution (CTDE) paradigm, we train VAE as an opponent model during the offline phase to extract the latent relationship between the agent’s local observations and the opponent’s local observations. During online execution, the trained opponent models are used to reconstruct the opponent’s local observations, which can be combined with episodic rewards to update the belief about the opponent’s policy. Finally, the agent reuses the best response policy based on the updated belief to improve online performance. We demonstrate that Bayes-Lab outperforms existing state-of-the-art methods in terms of detection accuracy, accumulative rewards, and episodic rewards in a predator–prey scenario. In this experimental environment, Bayes-Lab can achieve about 80% detection accuracy and the highest accumulative rewards, and its performance is less affected by the opponent policy switching interval. When the switching interval is less than 10, its detection accuracy is at least 10% higher than other algorithms.
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science