JOURNAL ARTICLE

Model-Based Reinforcement Learning in Factored-State MDPs

Abstract

We consider the problem of learning in a factored-state Markov decision process that is structured to allow a compact representation. We show that the well-known algorithm, factored Rmax, performs near-optimally on all but a number of timesteps that is polynomial in the size of the compact representation, which is often exponentially smaller than the number of states. This is equivalent to the result obtained by Kearns and Roller for their DBN-E 3 algorithm, except that we've conducted the analysis in a more general setting. We also extend the results to a new algorithm, factored IE, that uses the interval estimation approach to exploration and can be expected to outperform factored Rmax on most domains

Keywords:
Reinforcement learning Markov decision process Representation (politics) State (computer science) Computer science Markov process Artificial intelligence Polynomial Interval (graph theory) Markov chain Algorithm Mathematics Mathematical optimization Combinatorics Machine learning Statistics

Metrics

28
Cited By
3.88
FWCI (Field Weighted Citation Impact)
33
Refs
0.94
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Reinforcement Learning in Robotics
Physical Sciences →  Computer Science →  Artificial Intelligence
Machine Learning and Algorithms
Physical Sciences →  Computer Science →  Artificial Intelligence
Formal Methods in Verification
Physical Sciences →  Computer Science →  Computational Theory and Mathematics

Related Documents

JOURNAL ARTICLE

Algorithm-Directed Exploration for Model-Based Reinforcement Learning in Factored MDPs

Carlos GuestrinRelu PatrascuDale Schuurmans

Journal:   International Conference on Machine Learning Year: 2002 Vol: 29 (3)Pages: 235-242
BOOK-CHAPTER

TeXDYNA: Hierarchical Reinforcement Learning in Factored MDPs

Olga KozlovaOlivier SigaudChristophe Meyer

Lecture notes in computer science Year: 2010 Pages: 489-500
JOURNAL ARTICLE

Near-optimal Reinforcement Learning in Factored MDPs

Ian OsbandBenjamin Van Roy

Journal:   arXiv (Cornell University) Year: 2014 Vol: 27 Pages: 604-612
© 2026 ScienceGate Book Chapters — All rights reserved.