Federated Bandit-Reference-Cited by-同舟云学术

Federated Bandit

Published:2021-02-18 Issue:1 Volume:5 Page:1-29
ISSN:2476-1249
Container-title:Proceedings of the ACM on Measurement and Analysis of Computing Systems
language:en
Short-container-title:Proc. ACM Meas. Anal. Comput. Syst.

Author:

Zhu Zhaowei¹,Zhu Jingxuan²,Liu Ji²,Liu Yang¹

Affiliation:

1. University of California, Santa Cruz, Santa Cruz, CA, USA

2. Stony Brook University, Stony Brook, NY, USA

Abstract

In this paper, we study Federated Bandit, a decentralized Multi-Armed Bandit problem with a set of N agents, who can only communicate their local data with neighbors described by a connected graph G. Each agent makes a sequence of decisions on selecting an arm from M candidates, yet they only have access to local and potentially biased feedback/evaluation of the true reward for each action taken. Learning only locally will lead agents to sub-optimal actions while converging to a no-regret strategy requires a collection of distributed data. Motivated by the proposal of federated learning, we aim for a solution with which agents will never share their local observations with a central entity, and will be allowed to only share a private copy of his/her own information with their neighbors. We first propose a decentralized bandit algorithm \textttGossip\_UCB, which is a coupling of variants of both the classical gossiping algorithm and the celebrated Upper Confidence Bound (UCB) bandit algorithm. We show that \textttGossip\_UCB successfully adapts local bandit learning into a global gossiping process for sharing information among connected agents, and achieves guaranteed regret at the order of O(\max\ \textttpoly (N,M) łog T, \textttpoly (N,M)łog_łambda_2^-1 N\ ) for all N agents, where łambda_2\in(0,1) is the second largest eigenvalue of the expected gossip matrix, which is a function of G. We then propose \textttFed\_UCB, a differentially private version of \textttGossip\_UCB, in which the agents preserve ε-differential privacy of their local data while achieving O(\max \\frac\textttpoly (N,M) ε łog^2.5 T, \textttpoly (N,M) (łog_łambda_2^-1 N + łog T) \ ) regret.

Funder

National Science Foundation

Office of Naval Research

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture,Safety, Risk, Reliability and Quality,Computer Science (miscellaneous)

Link

https://dl.acm.org/doi/pdf/10.1145/3447380

Reference61 articles.

1. Randomized gossip algorithms

2. Learning to Control Renewal Processes with Bandit Feedback

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Caching User-Generated Content in Distributed Autonomous Networks via Contextual Bandit;IEEE Transactions on Mobile Computing;2024-08

2. On Federated Multi-Armed Bandits for Mobile Social Networks;2024 IEEE 44th International Conference on Distributed Computing Systems (ICDCS);2024-07-23

3. Vertical Federated Learning: Concepts, Advances, and Challenges;IEEE Transactions on Knowledge and Data Engineering;2024-07

4. Distributed Linear Bandits With Differential Privacy;IEEE Transactions on Network Science and Engineering;2024-05

5. Distributed Multiarmed Bandits;IEEE Transactions on Automatic Control;2023-05