JOURNAL ARTICLE

Heterogeneous Multi-agent Multi-armed Bandit on Stochastic Block Models

Mengfan XuLiren ShanFatemeh GhaffariXuchuang WangXutong LiuMohammad Hajiesmaili

Year: 2025 Journal:   Proceedings of the ACM on Measurement and Analysis of Computing Systems Vol: 9 (3)Pages: 1-59   Publisher: Association for Computing Machinery

Abstract

We study a novel heterogeneous multi-agent multi-armed bandit problem with a cluster structure induced by stochastic block models (SBMs), influencing not only graph topology, but also reward heterogeneity. Specifically, agents are distributed on random graphs based on SBMs, a generalized Erdos-Renyi model with heterogeneous edge probabilities: agents are grouped into clusters (known or unknown); edge probabilities for agents within the same cluster differ from those across clusters. The same cluster structure in SBMs also determines our heterogeneous rewards. Rewards distributions of the same arm vary across agents in different clusters but remain consistent within a cluster, unifying homogeneous and heterogeneous settings and varying degree of heterogeneity. Rewards are independent samples from these distributions. The objective is to minimize system-wide regret across all agents. To address this, we propose a novel algorithm applicable to both known and unknown cluster settings. The algorithm combines an averaging-based consensus approach with a newly introduced information aggregation and weighting technique, resulting in a UCB-type strategy. It accounts for graph randomness, leverages both intra-cluster (homogeneous) and inter-cluster (heterogeneous) information from rewards and graphs, and incorporates cluster detection for unknown cluster settings. We derive optimal instance-dependent regret upper bounds of order log T under sub-Gaussian rewards. Importantly, our regret bounds capture the degree of heterogeneity in the system (an additional layer of complexity), exhibit smaller constants, scale better for large systems, and impose significantly relaxed assumptions on edge probabilities.

Keywords:

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
28
Refs
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Related Documents

BOOK-CHAPTER

Stochastic Multi-armed Bandit

Rong ZhengCunqing Hua

Wireless networks Year: 2016 Pages: 9-25
BOOK-CHAPTER

The Stochastic Multi-Armed Bandit Problem

Shipra Agrawal

Springer series in supply chain management Year: 2022 Pages: 3-13
JOURNAL ARTICLE

Time-varying stochastic multi-armed bandit problems

Sattar VakiliQing ZhaoYuan Zhou

Journal:   2014 48th Asilomar Conference on Signals, Systems and Computers Year: 2014 Pages: 2103-2107
JOURNAL ARTICLE

The Multi-Armed Bandit With Stochastic Plays

Antoine Lesage‐LandryJoshua A. Taylor

Journal:   IEEE Transactions on Automatic Control Year: 2017 Vol: 63 (7)Pages: 2280-2286
JOURNAL ARTICLE

Scalable Video Block Streaming in UDNs Using a Multi-Agent Multi-Armed Bandit Approach

LI Lu-jiu

Journal:   The 2nd International Conference on Computing and Data Science Year: 2021 Pages: 1-5
© 2026 ScienceGate Book Chapters — All rights reserved.