Mengfan XuLiren ShanFatemeh GhaffariXuchuang WangXutong LiuMohammad Hajiesmaili
We study a novel heterogeneous multi-agent multi-armed bandit problem with a cluster structure induced by stochastic block models (SBMs), influencing not only graph topology, but also reward heterogeneity. Specifically, agents are distributed on random graphs based on SBMs, a generalized Erdos-Renyi model with heterogeneous edge probabilities: agents are grouped into clusters (known or unknown); edge probabilities for agents within the same cluster differ from those across clusters. The same cluster structure in SBMs also determines our heterogeneous rewards. Rewards distributions of the same arm vary across agents in different clusters but remain consistent within a cluster, unifying homogeneous and heterogeneous settings and varying degree of heterogeneity. Rewards are independent samples from these distributions. The objective is to minimize system-wide regret across all agents. To address this, we propose a novel algorithm applicable to both known and unknown cluster settings. The algorithm combines an averaging-based consensus approach with a newly introduced information aggregation and weighting technique, resulting in a UCB-type strategy. It accounts for graph randomness, leverages both intra-cluster (homogeneous) and inter-cluster (heterogeneous) information from rewards and graphs, and incorporates cluster detection for unknown cluster settings. We derive optimal instance-dependent regret upper bounds of order log T under sub-Gaussian rewards. Importantly, our regret bounds capture the degree of heterogeneity in the system (an additional layer of complexity), exhibit smaller constants, scale better for large systems, and impose significantly relaxed assumptions on edge probabilities.
Sattar VakiliQing ZhaoYuan Zhou
Antoine Lesage‐LandryJoshua A. Taylor