JOURNAL ARTICLE

Knowledge Gradient for Multi-objective Multi-armed Bandit Algorithms

Abstract

We extend knowledge gradient (KG) policy for the multi-objective multi-armed bandit problems to efficiently explore the Pareto optimal arms. We consider two partial order relationships to order the mean vectors, i.e. Pareto and scalarized functions. Pareto KG finds the optimal arms using Pareto search, while the scalarizations-KG transform the multi-objectives arms into one-objective arm to find the optimal arms. To measure the performance of the proposed algorithms, we propose three regret measures. We compare the performance of knowledge gradient policy with UCB1 on a multi-objective multi-armed bandit problem, where KG outperforms UCB1.

Keywords:
Regret Pareto principle Pareto optimal Mathematical optimization Measure (data warehouse) Computer science Order (exchange) Multi-objective optimization Multi-armed bandit Algorithm Mathematics Artificial intelligence Machine learning Data mining

Metrics

19
Cited By
4.51
FWCI (Field Weighted Citation Impact)
8
Refs
0.95
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Bandit Algorithms Research
Social Sciences →  Decision Sciences →  Management Science and Operations Research
Advanced Multi-Objective Optimization Algorithms
Physical Sciences →  Computer Science →  Computational Theory and Mathematics
Reinforcement Learning in Robotics
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.