Online Markov Decision Processes

Eyal Even-Dar; Sham M. Kakade; Yishay Mansour

doi:10.1287/moor.1090.0396

ScienceGate Book Chapters

JOURNAL ARTICLE

Online Markov Decision Processes

Eyal Even-Dar Sham M. Kakade Yishay Mansour

Year: 2009 Journal: Mathematics of Operations Research Vol: 34 (3)Pages: 726-736 Publisher: Institute for Operations Research and the Management Sciences

DOI: 10.1287/moor.1090.0396

Get Full-Text PDF Get Analytical Report

Abstract

We consider a Markov decision process (MDP) setting in which the reward function is allowed to change after each time step (possibly in an adversarial manner), yet the dynamics remain fixed. Similar to the experts setting, we address the question of how well an agent can do when compared to the reward achieved under the best stationary policy over time. We provide efficient algorithms, which have regret bounds with no dependence on the size of state space. Instead, these bounds depend only on a certain horizon time of the process and logarithmically on the number of actions.

Keywords:

Regret Markov decision process Mathematics Mathematical optimization Markov process Markov chain State space Partially observable Markov decision process Time horizon Process (computing) Function (biology) Mathematical economics Computer science Statistics

Metrics

144

Cited By

4.26

FWCI (Field Weighted Citation Impact)

Refs

0.94

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Bandit Algorithms Research

Social Sciences → Decision Sciences → Management Science and Operations Research

Reinforcement Learning in Robotics

Physical Sciences → Computer Science → Artificial Intelligence

Optimization and Search Problems

Physical Sciences → Computer Science → Computer Networks and Communications

Online Markov Decision Processes

Abstract

Metrics

Citation History

Topics

Related Documents

Blackwell Online Learning for Markov Decision Processes

Online Markov Decision Processes Under Bandit Feedback

Online Markov Decision Processes Configuration with Continuous Decision Space

Online model learning in adversarial Markov decision processes

Online Learning in Weakly Coupled Markov Decision Processes