Achieving complete learning in Multi-Armed Bandit problems

Sattar Vakili; Qing Zhao

doi:10.1109/acssc.2013.6810607

ScienceGate Book Chapters

JOURNAL ARTICLE

Achieving complete learning in Multi-Armed Bandit problems

Sattar Vakili Qing Zhao

Year: 2013 Vol: 1 Pages: 1778-1782

DOI: 10.1109/acssc.2013.6810607

Get Full-Text PDF Get Analytical Report

Abstract

In the classic Multi-Armed Bandit (MAB) problem, there is a given set of arms with unknown reward distributions. At each time, a player selects one arm to play, aiming to maximize the total expected reward over a horizon of length T. It is known that the minimum growth rate of regret (defined as the total expected loss with respect to the ideal scenario of known reward models of all arms) is logarithmic with T. In other words, mistakes in selecting suboptimal arms occur infinitely often, and the player will never converge to the arm with the largest reward mean. In this paper, we are interested in the questions that whether side information on the reward model can lead to bounded regret, thus, complete learning, and what is the minimum side information to achieve complete learning. We show that the knowledge of a value η between the largest and the second largest reward mean (among all arms) leads to complete learning by constructing an online learning policy with bounded regret. This result applies to both light-tailed and heavy-tailed reward distributions.

Keywords:

Regret Bounded function Multi-armed bandit Logarithm Set (abstract data type) Computer science Time horizon Online learning Value (mathematics) Artificial intelligence Mathematical optimization Mathematics Machine learning

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.14

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Bandit Algorithms Research

Social Sciences → Decision Sciences → Management Science and Operations Research

Reinforcement Learning in Robotics

Physical Sciences → Computer Science → Artificial Intelligence

Machine Learning and Algorithms

Physical Sciences → Computer Science → Artificial Intelligence

Achieving complete learning in Multi-Armed Bandit problems

Abstract

Metrics

Citation History

Topics

Related Documents

Multi-Armed Bandit Problems

Achieving Regular and Fair Learning in Combinatorial Multi-Armed Bandit

Mechanisms with learning for stochastic multi-armed bandit problems

Achieving Privacy in the Adversarial Multi-Armed Bandit

Ambiguity aversion in multi-armed bandit problems