Knowledge Gradient for Multi-objective Multi-armed Bandit Algorithms

doi:10.5220/0004796600740083

ScienceGate Book Chapters

JOURNAL ARTICLE

Knowledge Gradient for Multi-objective Multi-armed Bandit Algorithms

Year: 2014 Pages: 74-83

DOI: 10.5220/0004796600740083

Get Full-Text PDF Get Analytical Report

Abstract

We extend knowledge gradient (KG) policy for the multi-objective multi-armed bandit problems to efficiently explore the Pareto optimal arms. We consider two partial order relationships to order the mean vectors, i.e. Pareto and scalarized functions. Pareto KG finds the optimal arms using Pareto search, while the scalarizations-KG transform the multi-objectives arms into one-objective arm to find the optimal arms. To measure the performance of the proposed algorithms, we propose three regret measures. We compare the performance of knowledge gradient policy with UCB1 on a multi-objective multi-armed bandit problem, where KG outperforms UCB1.

Keywords:

Regret Pareto principle Pareto optimal Mathematical optimization Measure (data warehouse) Computer science Order (exchange) Multi-objective optimization Multi-armed bandit Algorithm Mathematics Artificial intelligence Machine learning Data mining

Metrics

Cited By

4.51

FWCI (Field Weighted Citation Impact)

Refs

0.95

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Bandit Algorithms Research

Social Sciences → Decision Sciences → Management Science and Operations Research

Advanced Multi-Objective Optimization Algorithms

Physical Sciences → Computer Science → Computational Theory and Mathematics

Reinforcement Learning in Robotics

Physical Sciences → Computer Science → Artificial Intelligence

Knowledge Gradient for Multi-objective Multi-armed Bandit Algorithms

Abstract

Metrics

Citation History

Topics

Related Documents

Scaling Multi-Armed Bandit Algorithms

Annealing-Pareto Multi-Objective Multi-Armed Bandit Algorithm

Annealing-pareto multi-objective multi-armed bandit algorithm

Multi-objective Contextual Multi-armed Bandit With a Dominant Objective

Scalarized and Pareto Knowledge Gradient for Multi-objective Multi-armed Bandits