Online Markov Decision Processes Configuration with Continuous Decision Space

Davide Maran; Pierriccardo Olivieri; Francesco Emanuele Stradi; Giuseppe Urso; Nicola Gatti; Marcello Restelli

doi:10.1609/aaai.v38i13.29344

ScienceGate Book Chapters

JOURNAL ARTICLE

Online Markov Decision Processes Configuration with Continuous Decision Space

Davide Maran Pierriccardo Olivieri Francesco Emanuele Stradi Giuseppe Urso Nicola Gatti Marcello Restelli

Year: 2024 Journal: Proceedings of the AAAI Conference on Artificial Intelligence Vol: 38 (13)Pages: 14315-14322 Publisher: Association for the Advancement of Artificial Intelligence

DOI: 10.1609/aaai.v38i13.29344

Get Full-Text PDF Get Analytical Report

Abstract

In this paper, we investigate the optimal online configuration of episodic Markov decision processes when the space of the possible configurations is continuous. Specifically, we study the interaction between a learner (referred to as the configurator) and an agent with a fixed, unknown policy, when the learner aims to minimize her losses by choosing transition functions in online fashion. The losses may be unrelated to the agent's rewards. This problem applies to many real-world scenarios where the learner seeks to manipulate the Markov decision process to her advantage. We study both deterministic and stochastic settings, where the losses are either fixed or sampled from an unknown probability distribution. We design two algorithms whose peculiarity is to rely on occupancy measures to explore with optimism the continuous space of transition functions, achieving constant regret in deterministic settings and sublinear regret in stochastic settings, respectively. Moreover, we prove that the regret bound is tight with respect to any constant factor in deterministic settings. Finally, we compare the empiric performance of our algorithms with a baseline in synthetic experiments.

Keywords:

Markov decision process Computer science Space (punctuation) Markov chain Markov process Mathematics Machine learning Statistics

Metrics

Cited By

0.70

FWCI (Field Weighted Citation Impact)

Refs

0.60

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Data Processing Techniques

Physical Sciences → Engineering → Control and Systems Engineering

Advanced Research in Systems and Signal Processing

Physical Sciences → Engineering → Control and Systems Engineering

Simulation Techniques and Applications

Social Sciences → Decision Sciences → Management Science and Operations Research

Online Markov Decision Processes Configuration with Continuous Decision Space

Abstract

Metrics

Citation History

Topics

Related Documents

Online Learning in Markov Decision Processes with Continuous Actions

Online Markov Decision Processes

Semiparametric estimation of Markov decision processes with continuous state space

Semiparametric Estimation of Markov Decision Processes with Continuous State Space

Continuous-Time Markov Decision Processes