JOURNAL ARTICLE

Online Markov Decision Processes

Eyal Even-DarSham M. KakadeYishay Mansour

Year: 2009 Journal:   Mathematics of Operations Research Vol: 34 (3)Pages: 726-736   Publisher: Institute for Operations Research and the Management Sciences

Abstract

We consider a Markov decision process (MDP) setting in which the reward function is allowed to change after each time step (possibly in an adversarial manner), yet the dynamics remain fixed. Similar to the experts setting, we address the question of how well an agent can do when compared to the reward achieved under the best stationary policy over time. We provide efficient algorithms, which have regret bounds with no dependence on the size of state space. Instead, these bounds depend only on a certain horizon time of the process and logarithmically on the number of actions.

Keywords:
Regret Markov decision process Mathematics Mathematical optimization Markov process Markov chain State space Partially observable Markov decision process Time horizon Process (computing) Function (biology) Mathematical economics Computer science Statistics

Metrics

144
Cited By
4.26
FWCI (Field Weighted Citation Impact)
24
Refs
0.94
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Bandit Algorithms Research
Social Sciences →  Decision Sciences →  Management Science and Operations Research
Reinforcement Learning in Robotics
Physical Sciences →  Computer Science →  Artificial Intelligence
Optimization and Search Problems
Physical Sciences →  Computer Science →  Computer Networks and Communications

Related Documents

JOURNAL ARTICLE

Online Markov Decision Processes Under Bandit Feedback

Gergely NeuAndrás GyörgyCsaba SzepesváriAndrás Antos

Journal:   IEEE Transactions on Automatic Control Year: 2014 Vol: 59 (3)Pages: 676-691
JOURNAL ARTICLE

Online Markov Decision Processes Configuration with Continuous Decision Space

Davide MaranPierriccardo OlivieriFrancesco Emanuele StradiGiuseppe UrsoNicola GattiMarcello Restelli

Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Year: 2024 Vol: 38 (13)Pages: 14315-14322
JOURNAL ARTICLE

Online model learning in adversarial Markov decision processes

Doran ChakrabortyPeter Stone

Journal:   Adaptive Agents and Multi-Agents Systems Year: 2010 Pages: 1583-1584
JOURNAL ARTICLE

Online Learning in Weakly Coupled Markov Decision Processes

Xiaohan WeiHao YuMichael J. Neely

Journal:   ACM SIGMETRICS Performance Evaluation Review Year: 2018 Vol: 46 (1)Pages: 56-58
© 2026 ScienceGate Book Chapters — All rights reserved.