JOURNAL ARTICLE

Achieving complete learning in Multi-Armed Bandit problems

Abstract

In the classic Multi-Armed Bandit (MAB) problem, there is a given set of arms with unknown reward distributions. At each time, a player selects one arm to play, aiming to maximize the total expected reward over a horizon of length T. It is known that the minimum growth rate of regret (defined as the total expected loss with respect to the ideal scenario of known reward models of all arms) is logarithmic with T. In other words, mistakes in selecting suboptimal arms occur infinitely often, and the player will never converge to the arm with the largest reward mean. In this paper, we are interested in the questions that whether side information on the reward model can lead to bounded regret, thus, complete learning, and what is the minimum side information to achieve complete learning. We show that the knowledge of a value η between the largest and the second largest reward mean (among all arms) leads to complete learning by constructing an online learning policy with bounded regret. This result applies to both light-tailed and heavy-tailed reward distributions.

Keywords:
Regret Bounded function Multi-armed bandit Logarithm Set (abstract data type) Computer science Time horizon Online learning Value (mathematics) Artificial intelligence Mathematical optimization Mathematics Machine learning

Metrics

8
Cited By
0.00
FWCI (Field Weighted Citation Impact)
10
Refs
0.14
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Bandit Algorithms Research
Social Sciences →  Decision Sciences →  Management Science and Operations Research
Reinforcement Learning in Robotics
Physical Sciences →  Computer Science →  Artificial Intelligence
Machine Learning and Algorithms
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Mechanisms with learning for stochastic multi-armed bandit problems

Shweta JainSatyanath BhatGanesh GhalmeDivya PadmanabhanY. Narahari

Journal:   Indian Journal of Pure and Applied Mathematics Year: 2016 Vol: 47 (2)Pages: 229-272
JOURNAL ARTICLE

Achieving Privacy in the Adversarial Multi-Armed Bandit

Aristide C. Y. TossouChristos Dimitrakakis

Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Year: 2017 Vol: 31 (1)Pages: 2653-2659
JOURNAL ARTICLE

Ambiguity aversion in multi-armed bandit problems

Christopher M. Anderson

Journal:   Theory and Decision Year: 2011 Vol: 72 (1)Pages: 15-33
© 2026 ScienceGate Book Chapters — All rights reserved.