JOURNAL ARTICLE

Multi-Armed Bandit Problems under Delayed Feedback

Abstract

In this thesis, the multi-armed bandit (MAB) problem in online learning is studied, when the feedback information is not observed immediately but rather after arbitrary, unknown, random delays. In the ``stochastic" setting when the rewards come from a fixed distribution, an algorithm is given that uses a non-delayed MAB algorithm as a black-box. We also give a method to generalize the theoretical guarantees of non-delayed UCB-type algorithms to the delayed stochastic setting. Assuming the delays are independent of the rewards, we upper bound the penalty in the performance of these algorithms (measured by ``regret'') by an additive term depending on the delays. When the rewards are chosen in an adversarial manner, we give a black-box style algorithm using multiple instances of a non-delayed adversarial MAB algorithm. Assuming the delays depend only on time, we upper bound the performance penalty of the algorithm by a multiplicative factor depending on the delays.

Keywords:
Computer science

Metrics

3
Cited By
0.00
FWCI (Field Weighted Citation Impact)
25
Refs
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Bandit Algorithms Research
Social Sciences →  Decision Sciences →  Management Science and Operations Research
Optimization and Search Problems
Physical Sciences →  Computer Science →  Computer Networks and Communications
Reinforcement Learning in Robotics
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Bernoulli multi-armed bandit problem under delayed feedback

Andrii Dzhoha

Journal:   Bulletin of Taras Shevchenko National University of Kyiv Series Physics and Mathematics Year: 2021 Pages: 20-26
JOURNAL ARTICLE

Risk-Averse Multi-Armed Bandit Problems Under Mean-Variance Measure

Sattar VakiliQing Zhao

Journal:   IEEE Journal of Selected Topics in Signal Processing Year: 2016 Vol: 10 (6)Pages: 1093-1111
JOURNAL ARTICLE

Ambiguity aversion in multi-armed bandit problems

Christopher M. Anderson

Journal:   Theory and Decision Year: 2011 Vol: 72 (1)Pages: 15-33
© 2026 ScienceGate Book Chapters — All rights reserved.