JOURNAL ARTICLE

Mean Field Equilibrium in Multi-Armed Bandit Game with Continuous Reward

Abstract

Mean field game facilitates analyzing multi-armed bandit (MAB) for a large number of agents by approximating their interactions with an average effect. Existing mean field models for multi-agent MAB mostly assume a binary reward function, which leads to tractable analysis but is usually not applicable in practical scenarios. In this paper, we study the mean field bandit game with a continuous reward function. Specifically, we focus on deriving the existence and uniqueness of mean field equilibrium (MFE), thereby guaranteeing the asymptotic stability of the multi-agent system. To accommodate the continuous reward function, we encode the learned reward into an agent state, which is in turn mapped to its stochastic arm playing policy and updated using realized observations. We show that the state evolution is upper semi-continuous, based on which the existence of MFE is obtained. As the Markov analysis is mainly for the case of discrete state, we transform the stochastic continuous state evolution into a deterministic ordinary differential equation (ODE). On this basis, we can characterize a contraction mapping for the ODE to ensure a unique MFE for the bandit game. Extensive evaluations validate our MFE characterization, and exhibit tight empirical regret of the MAB problem.

Keywords:
Ode Regret Uniqueness Mathematical optimization Computer science Ordinary differential equation Markov process Multi-armed bandit Function (biology) Applied mathematics Mathematics Differential equation Statistics Machine learning

Metrics

4
Cited By
0.35
FWCI (Field Weighted Citation Impact)
20
Refs
0.63
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Bandit Algorithms Research
Social Sciences →  Decision Sciences →  Management Science and Operations Research
Reinforcement Learning in Robotics
Physical Sciences →  Computer Science →  Artificial Intelligence
Game Theory and Applications
Social Sciences →  Decision Sciences →  Management Science and Operations Research

Related Documents

JOURNAL ARTICLE

Existence and uniqueness of mean field equilibrium in continuous bandit game

Xiong WangYuqing LiRiheng Jia

Journal:   Science China Information Sciences Year: 2025 Vol: 68 (3)
JOURNAL ARTICLE

Multi-armed bandit approach for mean field game-based resource allocation in NOMA networks

Amani BenamorOussama HabachiInès KammounJean‐Pierre Cances

Journal:   EURASIP Journal on Wireless Communications and Networking Year: 2024 Vol: 2024 (1)
BOOK-CHAPTER

Noise Free Multi-armed Bandit Game

Atsuyoshi NakamuraDavid P. HelmboldManfred K. Warmuth

Lecture notes in computer science Year: 2016 Pages: 412-423
© 2026 ScienceGate Book Chapters — All rights reserved.