Thompson Sampling for Bandit Convex Optimization

Bakhtiari, SeyedAlireza

doi:10.7939/83050

ScienceGate Book Chapters

DISSERTATION

Thompson Sampling for Bandit Convex Optimization

Bakhtiari, SeyedAlireza

Year: 2025 University: University of Alberta Library

DOI: 10.7939/83050

Get Full-Text PDF Get Analytical Report

Abstract

Thompson sampling (TS) is a popular and empirically successful algorithm for online decision-making problems. This thesis advances our understanding of TS when applied to bandit convex optimization (BCO) problems, by providing new theoretical guarantees and characterizing its limitations. First, we analyze $1$-dimensional BCO and show that TS achieves a near-optimal Bayesian regret of at most $\tilde O(\sqrt{n})$, where $n$ is the time horizon. This result holds without strong assumptions on the loss functions, requiring only convexity, boundedness, and a mild Lipschitz condition. In sharp contrast, we demonstrate that for general high-dimensional problems, TS can fail catastrophically. More positively, we establish a Bayesian regret bound of $\tilde O(d^{2.5} \sqrt{n})$ for TS in generalized linear bandits, even when the convex monotone link function is unknown. Finally, we prove a fundamental limitation of current analysis techniques: we show that the standard information-theoretic machinery can never yield a regret bound better than the existing $\tilde O(d^{1.5} \sqrt{n})$ in the general case.

Keywords:

Regret Lipschitz continuity Bayesian probability Convex optimization Monotone polygon Thompson sampling Function (biology) Regular polygon

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Advanced Bandit Algorithms Research

Social Sciences → Decision Sciences → Management Science and Operations Research

Stochastic Gradient Optimization Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Sparse and Compressive Sensing Techniques

Physical Sciences → Engineering → Computational Mechanics

Thompson Sampling for Bandit Convex Optimization

Abstract

Metrics

Topics

Related Documents

Thompson Sampling for the Multinomial Logit Bandit

Thompson Sampling for Non-Stationary Bandit Problems

[Re] Bandit Theory and Thompson Sampling-guided Directed Evolution for Sequence Optimization

[Re] Bandit Theory and Thompson Sampling-guided Directed Evolution for Sequence Optimization

Thompson Sampling for Bandit Learning in Matching Markets