Skip Navigation

Federated Bandit

Paper title: Federated Bandit: A Gossiping Approach


  • Zhaowei Zhu, University of California, Santa Cruz
  • Jingxuan Zhu, Stony Brook AMS PhD student
  • Ji Liu, Stony Brook ECE Faculty
  • Yang Liu, University of California, Santa Cruz


Proceedings of the ACM on Measurement and Analysis of Computing Systems (SIGMETRICS), 2021
Acceptance Rate: 12%

Novel Technical Contribution

The multi-armed bandit problem is a sequential decision-making problem which has various applications in engineered and natural systems such as healthcare and online recommendation systems. The paper introduces a novel extension of the classical multi-armed bandit problem to a fully decentralized federated learning setting with gossiping, where each individual decision-maker only has access to biased rewards and limited communication capacity, and proposes the first fully decentralized bandit learning framework that handles heterogeneous data sources with a privacy guarantee.

Societal Contribution

Federated Bandit offers a novel, efficient and private solution to the problem of decentralized users sharing their local sequential learning outcomes with only neighbors (but not a central agent) in a private manner. It is of interest to practitioners and policy makers across different sectors such as finance (for financial institutes to share results trained on their own customers’ data), healthcare (for sharing treatment results across different hospitals) and governmental works (for sharing policies/decision made based on citizen information). The paper’s synthetic experiments on a hospital dataset have demonstrated this potential.


In this paper, we study Federated Bandit, a decentralized multi-armed bandit problem with a set of 𝑁 agents, who can only communicate their local data with neighbors described by a connected graph 𝐺. Each agent makes a sequence of decisions on selecting an arm from 𝑀 candidates, yet they only have access to local and potentially biased feedback/evaluation of the true reward for each action taken. Learning only locally will lead agents to sub-optimal actions while converging to a no-regret strategy requires a collection of distributed data. Motivated by the proposal of federated learning, we aim for a solution with which agents will never share their local observations with a central entity, and will be allowed to only share a private copy of his/her
own information with their neighbors. We first propose a decentralized bandit algorithm Gossip_UCB, which is a coupling of variants of both the classical gossiping algorithm and the celebrated Upper Confidence Bound (UCB) bandit algorithm. We show that Gossip_UCB successfully adapts local bandit learning into a global gossiping process for sharing information among connected agents, and achieves guaranteed regret for all 𝑁 agents, which is a function of 𝐺. We then propose Fed_UCB, a differentially private version of Gossip_UCB, in which the agents preserve differential privacy of their local data while achieving guaranteed regret.