JOURNAL ARTICLE

Online Markov Decision Processes Configuration with Continuous Decision Space

Davide MaranPierriccardo OlivieriFrancesco Emanuele StradiGiuseppe UrsoNicola GattiMarcello Restelli

Year: 2024 Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Vol: 38 (13)Pages: 14315-14322   Publisher: Association for the Advancement of Artificial Intelligence

Abstract

In this paper, we investigate the optimal online configuration of episodic Markov decision processes when the space of the possible configurations is continuous. Specifically, we study the interaction between a learner (referred to as the configurator) and an agent with a fixed, unknown policy, when the learner aims to minimize her losses by choosing transition functions in online fashion. The losses may be unrelated to the agent's rewards. This problem applies to many real-world scenarios where the learner seeks to manipulate the Markov decision process to her advantage. We study both deterministic and stochastic settings, where the losses are either fixed or sampled from an unknown probability distribution. We design two algorithms whose peculiarity is to rely on occupancy measures to explore with optimism the continuous space of transition functions, achieving constant regret in deterministic settings and sublinear regret in stochastic settings, respectively. Moreover, we prove that the regret bound is tight with respect to any constant factor in deterministic settings. Finally, we compare the empiric performance of our algorithms with a baseline in synthetic experiments.

Keywords:
Markov decision process Computer science Space (punctuation) Markov chain Markov process Mathematics Machine learning Statistics

Metrics

1
Cited By
0.70
FWCI (Field Weighted Citation Impact)
38
Refs
0.60
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Data Processing Techniques
Physical Sciences →  Engineering →  Control and Systems Engineering
Advanced Research in Systems and Signal Processing
Physical Sciences →  Engineering →  Control and Systems Engineering
Simulation Techniques and Applications
Social Sciences →  Decision Sciences →  Management Science and Operations Research

Related Documents

BOOK-CHAPTER

Online Learning in Markov Decision Processes with Continuous Actions

Yi-Te HongChi-Jen Lu

Lecture notes in computer science Year: 2015 Pages: 302-316
JOURNAL ARTICLE

Online Markov Decision Processes

Eyal Even-DarSham M. KakadeYishay Mansour

Journal:   Mathematics of Operations Research Year: 2009 Vol: 34 (3)Pages: 726-736
JOURNAL ARTICLE

Semiparametric estimation of Markov decision processes with continuous state space

Sorawoot SrisumaOliver Linton

Journal:   Journal of Econometrics Year: 2011 Vol: 166 (2)Pages: 320-341
© 2026 ScienceGate Book Chapters — All rights reserved.