Thompson sampling (TS) is a popular and empirically successful algorithm for online decision-making problems. This thesis advances our understanding of TS when applied to bandit convex optimization (BCO) problems, by providing new theoretical guarantees and characterizing its limitations. First, we analyze $1$-dimensional BCO and show that TS achieves a near-optimal Bayesian regret of at most $\tilde O(\sqrt{n})$, where $n$ is the time horizon. This result holds without strong assumptions on the loss functions, requiring only convexity, boundedness, and a mild Lipschitz condition. In sharp contrast, we demonstrate that for general high-dimensional problems, TS can fail catastrophically. More positively, we establish a Bayesian regret bound of $\tilde O(d^{2.5} \sqrt{n})$ for TS in generalized linear bandits, even when the convex monotone link function is unknown. Finally, we prove a fundamental limitation of current analysis techniques: we show that the standard information-theoretic machinery can never yield a regret bound better than the existing $\tilde O(d^{1.5} \sqrt{n})$ in the general case.
Shipra AgrawalVashist AvadhanulaVineet GoyalAssaf Zeevi