Robust Multi-Agent Bandits Over Undirected Graphs-Reference-Cited by-同舟云学术

Robust Multi-Agent Bandits Over Undirected Graphs

Published:2022-12 Issue:3 Volume:6 Page:1-57
ISSN:2476-1249
Container-title:Proceedings of the ACM on Measurement and Analysis of Computing Systems
language:en
Short-container-title:Proc. ACM Meas. Anal. Comput. Syst.

Author:

Vial Daniel¹^ORCID,Shakkottai Sanjay¹^ORCID,Srikant R.²^ORCID

Affiliation:

1. University of Texas at Austin, Austin, TX, USA

2. University of Illinois Urbana-Champaign, Urbana-Champaign, IL, USA

Abstract

We consider a multi-agent multi-armed bandit setting in which n honest agents collaborate over a network to minimize regret but m malicious agents can disrupt learning arbitrarily. Assuming the network is the complete graph, existing algorithms incur O((m + K/n) łog (T) / Δ ) regret in this setting, where K is the number of arms and Δ is the arm gap. For m łl K, this improves over the single-agent baseline regret of O(Kłog(T)/Δ). In this work, we show the situation is murkier beyond the case of a complete graph. In particular, we prove that if the state-of-the-art algorithm is used on the undirected line graph, honest agents can suffer (nearly) linear regret until time is doubly exponential in K and n . In light of this negative result, we propose a new algorithm for which the i -th agent has regret O(( dmal (i) + K/n) łog(T)/Δ) on any connected and undirected graph, where dmal(i) is the number of i 's neighbors who are malicious. Thus, we generalize existing regret bounds beyond the complete graph (where dmal(i) = m), and show the effect of malicious agents is entirely local (in the sense that only the dmal (i) malicious agents directly connected to i affect its long-term regret).

Funder

ONR

NSF

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture,Safety, Risk, Reliability and Quality,Computer Science (miscellaneous)

Link

https://dl.acm.org/doi/pdf/10.1145/3570614

Reference58 articles.

1. Distributed Algorithms for Learning and Cognitive Medium Access with Logarithmic Regret

2. Jean-Yves Audibert and Sébastien Bubeck . 2010 . Best Arm Identification in Multi-Armed Bandits. In COLT-23th Conference on Learning Theory-2010 . 13--p. Jean-Yves Audibert and Sébastien Bubeck. 2010. Best Arm Identification in Multi-Armed Bandits. In COLT-23th Conference on Learning Theory-2010. 13--p.

3. Peter Auer , Nicolo Cesa-Bianchi , and Paul Fischer . 2002. Finite-time analysis of the multiarmed bandit problem. Machine learning , Vol. 47 , 2--3 ( 2002 ), 235--256. Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine learning, Vol. 47, 2--3 (2002), 235--256.

4. Gambling in a rigged casino: The adversarial multi-armed bandit problem

5. Concurrent Bandits and Cognitive Radio Networks

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Distributed Robust Bandits With Efficient Communication;IEEE Transactions on Network Science and Engineering;2023-05-01