JOURNAL ARTICLE

Near-optimal Reinforcement Learning in Factored MDPs

Ian OsbandBenjamin Van Roy

Year: 2014 Journal:   arXiv (Cornell University) Vol: 27 Pages: 604-612   Publisher: Cornell University

Abstract

Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suffer $Ω(\sqrt{SAT})$ regret on some MDP, where $T$ is the elapsed time and $S$ and $A$ are the cardinalities of the state and action spaces. This implies $T = Ω(SA)$ time to guarantee a near-optimal policy. In many settings of practical interest, due to the curse of dimensionality, $S$ and $A$ can be so enormous that this learning time is unacceptable. We establish that, if the system is known to be a \emph{factored} MDP, it is possible to achieve regret that scales polynomially in the number of \emph{parameters} encoding the factored MDP, which may be exponentially smaller than $S$ or $A$. We provide two algorithms that satisfy near-optimal regret bounds in this context: posterior sampling reinforcement learning (PSRL) and an upper confidence bound algorithm (UCRL-Factored).

Keywords:
Regret Reinforcement learning Markov decision process Curse of dimensionality Computer science Context (archaeology) Mathematical optimization Thompson sampling Q-learning Artificial intelligence Markov process Mathematics Machine learning Statistics

Metrics

54
Cited By
0.00
FWCI (Field Weighted Citation Impact)
21
Refs
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Bandit Algorithms Research
Social Sciences →  Decision Sciences →  Management Science and Operations Research
Reinforcement Learning in Robotics
Physical Sciences →  Computer Science →  Artificial Intelligence
Smart Grid Energy Management
Physical Sciences →  Engineering →  Electrical and Electronic Engineering

Related Documents

BOOK-CHAPTER

TeXDYNA: Hierarchical Reinforcement Learning in Factored MDPs

Olga KozlovaOlivier SigaudChristophe Meyer

Lecture notes in computer science Year: 2010 Pages: 489-500
BOOK-CHAPTER

Structure Learning in Factored MDPs

Doran Chakraborty

Studies in computational intelligence Year: 2013 Pages: 99-111
© 2026 ScienceGate Book Chapters — All rights reserved.