DISSERTATION

Thompson Sampling for Bandit Convex Optimization

Abstract

Thompson sampling (TS) is a popular and empirically successful algorithm for online decision-making problems. This thesis advances our understanding of TS when applied to bandit convex optimization (BCO) problems, by providing new theoretical guarantees and characterizing its limitations. First, we analyze $1$-dimensional BCO and show that TS achieves a near-optimal Bayesian regret of at most $\tilde O(\sqrt{n})$, where $n$ is the time horizon. This result holds without strong assumptions on the loss functions, requiring only convexity, boundedness, and a mild Lipschitz condition. In sharp contrast, we demonstrate that for general high-dimensional problems, TS can fail catastrophically. More positively, we establish a Bayesian regret bound of $\tilde O(d^{2.5} \sqrt{n})$ for TS in generalized linear bandits, even when the convex monotone link function is unknown. Finally, we prove a fundamental limitation of current analysis techniques: we show that the standard information-theoretic machinery can never yield a regret bound better than the existing $\tilde O(d^{1.5} \sqrt{n})$ in the general case.

Keywords:
Regret Lipschitz continuity Bayesian probability Convex optimization Monotone polygon Thompson sampling Function (biology) Regular polygon

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Advanced Bandit Algorithms Research
Social Sciences →  Decision Sciences →  Management Science and Operations Research
Stochastic Gradient Optimization Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Sparse and Compressive Sensing Techniques
Physical Sciences →  Engineering →  Computational Mechanics

Related Documents

JOURNAL ARTICLE

Thompson Sampling for the Multinomial Logit Bandit

Shipra AgrawalVashist AvadhanulaVineet GoyalAssaf Zeevi

Journal:   Mathematics of Operations Research Year: 2025
JOURNAL ARTICLE

Thompson Sampling for Non-Stationary Bandit Problems

Qi HanFei GuoLi Zhu

Journal:   Entropy Year: 2025 Vol: 27 (1)Pages: 51-51
JOURNAL ARTICLE

[Re] Bandit Theory and Thompson Sampling-guided Directed Evolution for Sequence Optimization

Žontar, Luka

Journal:   Zenodo (CERN European Organization for Nuclear Research) Year: 2023
JOURNAL ARTICLE

[Re] Bandit Theory and Thompson Sampling-guided Directed Evolution for Sequence Optimization

Luka Žontar

Journal:   Zenodo (CERN European Organization for Nuclear Research) Year: 2023
JOURNAL ARTICLE

Thompson Sampling for Bandit Learning in Matching Markets

Fang KongJunming YinShuai Li

Journal:   Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence Year: 2022 Pages: 3164-3170
© 2026 ScienceGate Book Chapters — All rights reserved.