Near-optimal Reinforcement Learning in Factored MDPs

Ian Osband; Benjamin Van Roy

doi:10.48550/arxiv.1403.3741

ScienceGate Book Chapters

JOURNAL ARTICLE

Near-optimal Reinforcement Learning in Factored MDPs

Ian Osband Benjamin Van Roy

Year: 2014 Journal: arXiv (Cornell University) Vol: 27 Pages: 604-612 Publisher: Cornell University

DOI: 10.48550/arxiv.1403.3741

Get Full-Text PDF Get Analytical Report

Abstract

Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suffer $Ω(\sqrt{SAT})$ regret on some MDP, where $T$ is the elapsed time and $S$ and $A$ are the cardinalities of the state and action spaces. This implies $T = Ω(SA)$ time to guarantee a near-optimal policy. In many settings of practical interest, due to the curse of dimensionality, $S$ and $A$ can be so enormous that this learning time is unacceptable. We establish that, if the system is known to be a \emph{factored} MDP, it is possible to achieve regret that scales polynomially in the number of \emph{parameters} encoding the factored MDP, which may be exponentially smaller than $S$ or $A$. We provide two algorithms that satisfy near-optimal regret bounds in this context: posterior sampling reinforcement learning (PSRL) and an upper confidence bound algorithm (UCRL-Factored).

Keywords:

Regret Reinforcement learning Markov decision process Curse of dimensionality Computer science Context (archaeology) Mathematical optimization Thompson sampling Q-learning Artificial intelligence Markov process Mathematics Machine learning Statistics

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Bandit Algorithms Research

Social Sciences → Decision Sciences → Management Science and Operations Research

Reinforcement Learning in Robotics

Physical Sciences → Computer Science → Artificial Intelligence

Smart Grid Energy Management

Physical Sciences → Engineering → Electrical and Electronic Engineering

Near-optimal Reinforcement Learning in Factored MDPs

Abstract

Metrics

Citation History

Topics

Related Documents

TeXDYNA: Hierarchical Reinforcement Learning in Factored MDPs

Model-Based Reinforcement Learning in Factored-State MDPs

Exploiting Additive Structure in Factored MDPs for Reinforcement Learning

Structure Learning in Factored MDPs

Automatic Feature Selection for Model-Based Reinforcement Learning in Factored MDPs