Pairwise Regression with Upper Confidence Bound for Contextual Bandit with Multiple Actions

Ya‐Hsuan Chang; Hsuan-Tien Lin

doi:10.1109/taai.2013.18

ScienceGate Book Chapters

JOURNAL ARTICLE

Pairwise Regression with Upper Confidence Bound for Contextual Bandit with Multiple Actions

Ya‐Hsuan Chang Hsuan-Tien Lin

Year: 2013 Pages: 19-24

DOI: 10.1109/taai.2013.18

Get Full-Text PDF Get Analytical Report

Abstract

The contextual bandit problem is typically used to model online applications such as article recommendation. However, the problem cannot fully meet certain needs of these applications, such as performing multiple actions at the same time. We defined a new Contextual Bandit Problem with Multiple Actions (CBMA), which is an extension of the traditional contextual bandit problem and fits the online applications better. We adapt some existing contextual bandit algorithms for our CBMA problem, and developed the new Pair wise Regression with Upper Confidence Bound (PairUCB) algorithm which addresses the new properties of the new CBMA problem. Experimental results demonstrate that PairUCB significantly outperforms other approaches.

Keywords:

Pairwise comparison Computer science Upper and lower bounds Artificial intelligence Regression Machine learning Extension (predicate logic) Mathematical optimization Mathematics Statistics

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.14

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Bandit Algorithms Research

Social Sciences → Decision Sciences → Management Science and Operations Research

Data Stream Mining Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Recommender Systems and Techniques

Physical Sciences → Computer Science → Information Systems

Pairwise Regression with Upper Confidence Bound for Contextual Bandit with Multiple Actions

Abstract

Metrics

Citation History

Topics

Related Documents

Linear Upper Confidence Bound Algorithm for Contextual Bandit Problem with Piled Rewards

On Upper-Confidence Bound Policies for Switching Bandit Problems

Cascaded Algorithm-Selection and Hyper-Parameter Optimization with Extreme-Region Upper Confidence Bound Bandit

Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem

Maximal expectation as upper confidence bound for multi-armed bandit problems