Abstract

The Multi-Armed Bandit (MAB) is a fundamental model capturing the dilemma between exploration and exploitation in sequential decision making. At every time step, the decision maker selects a set of arms and observes a reward from each of the chosen arms. In this paper, we present a variant of the problem, which we call the Scaling MAB (S-MAB): The goal of the decision maker is not only to maximize the cumulative rewards, i.e., choosing the arms with the highest expected reward, but also to decide how many arms to select so that, in expectation, the cost of selecting arms does not exceed the rewards. This problem is relevant to many real-world applications, e.g., online advertising, financial investments or data stream monitoring. We propose an extension of Thompson Sampling, which has strong theoretical guarantees and is reported to perform well in practice. Our extension dynamically controls the number of arms to draw. Furthermore, we combine the proposed method with ADWIN, a state-of-the-art change detector, to deal with non-static environments. We illustrate the benefits of our contribution via a real-world use case on predictive maintenance.

Keywords:
Decision maker Computer science Extension (predicate logic) Set (abstract data type) Multi-armed bandit State (computer science) Thompson sampling Scaling Dilemma Artificial intelligence Machine learning Algorithm Operations research Engineering Mathematics Bayesian probability

Metrics

19
Cited By
2.72
FWCI (Field Weighted Citation Impact)
37
Refs
0.90
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Bandit Algorithms Research
Social Sciences →  Decision Sciences →  Management Science and Operations Research
Data Stream Mining Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Smart Grid Energy Management
Physical Sciences →  Engineering →  Electrical and Electronic Engineering

Related Documents

JOURNAL ARTICLE

Anytime algorithms for multi-armed bandit problems

Robert Kleinberg

Journal:   Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm - SODA '06 Year: 2006 Pages: 928-936
JOURNAL ARTICLE

Anytime algorithms for multi-armed bandit problems

Robert Kleinberg

Journal:   Symposium on Discrete Algorithms Year: 2006 Pages: 928-936
BOOK-CHAPTER

Multi-armed Bandit Algorithms and Empirical Evaluation

Joannès VermorelMehryar Mohri

Lecture notes in computer science Year: 2005 Pages: 437-448
© 2026 ScienceGate Book Chapters — All rights reserved.