Multi-Armed Bandit Problems under Delayed Feedback

Pooria Joulani

doi:10.7939/r3bh85

ScienceGate Book Chapters

JOURNAL ARTICLE

Multi-Armed Bandit Problems under Delayed Feedback

Pooria Joulani

Year: 2012 Journal: University of Alberta Library

DOI: 10.7939/r3bh85

Get Full-Text PDF Get Analytical Report

Abstract

In this thesis, the multi-armed bandit (MAB) problem in online learning is studied, when the feedback information is not observed immediately but rather after arbitrary, unknown, random delays. In the ``stochastic" setting when the rewards come from a fixed distribution, an algorithm is given that uses a non-delayed MAB algorithm as a black-box. We also give a method to generalize the theoretical guarantees of non-delayed UCB-type algorithms to the delayed stochastic setting. Assuming the delays are independent of the rewards, we upper bound the penalty in the performance of these algorithms (measured by ``regret'') by an additive term depending on the delays. When the rewards are chosen in an adversarial manner, we give a black-box style algorithm using multiple instances of a non-delayed adversarial MAB algorithm. Assuming the delays depend only on time, we upper bound the performance penalty of the algorithm by a multiplicative factor depending on the delays.

Keywords:

Computer science

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Bandit Algorithms Research

Social Sciences → Decision Sciences → Management Science and Operations Research

Optimization and Search Problems

Physical Sciences → Computer Science → Computer Networks and Communications

Reinforcement Learning in Robotics

Physical Sciences → Computer Science → Artificial Intelligence

Multi-Armed Bandit Problems under Delayed Feedback

Abstract

Metrics

Citation History

Topics

Related Documents

Bernoulli multi-armed bandit problem under delayed feedback

Multi-Armed Bandit Problems

Strategic Multi-Armed Bandit Problems Under Debt-Free Reporting

Risk-Averse Multi-Armed Bandit Problems Under Mean-Variance Measure

Ambiguity aversion in multi-armed bandit problems