Heterogeneous Multi-agent Multi-armed Bandit on Stochastic Block Models

Mengfan Xu; Liren Shan; Fatemeh Ghaffari; Xuchuang Wang; Xutong Liu; Mohammad Hajiesmaili

doi:10.1145/3771570

ScienceGate Book Chapters

JOURNAL ARTICLE

Heterogeneous Multi-agent Multi-armed Bandit on Stochastic Block Models

Mengfan Xu Liren Shan Fatemeh Ghaffari Xuchuang Wang Xutong Liu Mohammad Hajiesmaili

Year: 2025 Journal: Proceedings of the ACM on Measurement and Analysis of Computing Systems Vol: 9 (3)Pages: 1-59 Publisher: Association for Computing Machinery

DOI: 10.1145/3771570

Get Full-Text PDF Get Analytical Report

Abstract

We study a novel heterogeneous multi-agent multi-armed bandit problem with a cluster structure induced by stochastic block models (SBMs), influencing not only graph topology, but also reward heterogeneity. Specifically, agents are distributed on random graphs based on SBMs, a generalized Erdos-Renyi model with heterogeneous edge probabilities: agents are grouped into clusters (known or unknown); edge probabilities for agents within the same cluster differ from those across clusters. The same cluster structure in SBMs also determines our heterogeneous rewards. Rewards distributions of the same arm vary across agents in different clusters but remain consistent within a cluster, unifying homogeneous and heterogeneous settings and varying degree of heterogeneity. Rewards are independent samples from these distributions. The objective is to minimize system-wide regret across all agents. To address this, we propose a novel algorithm applicable to both known and unknown cluster settings. The algorithm combines an averaging-based consensus approach with a newly introduced information aggregation and weighting technique, resulting in a UCB-type strategy. It accounts for graph randomness, leverages both intra-cluster (homogeneous) and inter-cluster (heterogeneous) information from rewards and graphs, and incorporates cluster detection for unknown cluster settings. We derive optimal instance-dependent regret upper bounds of order log T under sub-Gaussian rewards. Importantly, our regret bounds capture the degree of heterogeneity in the system (an additional layer of complexity), exhibit smaller constants, scale better for large systems, and impose significantly relaxed assumptions on edge probabilities.

Keywords:

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Heterogeneous Multi-agent Multi-armed Bandit on Stochastic Block Models

Abstract

Metrics

Topics

Related Documents

Stochastic Multi-armed Bandit

The Stochastic Multi-Armed Bandit Problem

Time-varying stochastic multi-armed bandit problems

The Multi-Armed Bandit With Stochastic Plays

Scalable Video Block Streaming in UDNs Using a Multi-Agent Multi-Armed Bandit Approach