Online Reinforcement Learning in Markov Decision Process Using Linear Programming

Vincent Léon; S. Rasoul Etesami

doi:10.1109/cdc49753.2023.10383839

ScienceGate Book Chapters

JOURNAL ARTICLE

Online Reinforcement Learning in Markov Decision Process Using Linear Programming

Vincent Léon S. Rasoul Etesami

Year: 2023 Pages: 1973-1978

DOI: 10.1109/cdc49753.2023.10383839

Get Full-Text PDF Get Analytical Report

Abstract

We consider online reinforcement learning in episodic Markov decision process (MDP) with unknown transition function and stochastic rewards drawn from some fixed but unknown distribution. The learner aims to learn the optimal policy and minimize their regret over a finite time horizon through interacting with the environment. We devise a simple and efficient model-based algorithm that achieves $\tilde{O}(LX\sqrt{TA})$ regret with high probability, where $L$ is the episode length, $T$ is the number of episodes, and $X$ and $A$ are the cardinalities of the state space and the action space, respectively. The proposed algorithm, which is based on the concept of "optimism in the face of uncertainty", maintains confidence sets of transition and reward functions and uses occupancy measures to connect the online MDP with linear programming. It achieves a tighter regret bound compared to the existing works that use a similar confidence set framework and improves computational effort compared to those that use a different framework but with a slightly tighter regret bound.

Keywords:

Markov decision process Reinforcement learning Computer science Markov process Partially observable Markov decision process Linear programming Machine learning Artificial intelligence Process (computing) Markov chain Mathematical optimization Markov model Programming language Algorithm Mathematics Statistics

Metrics

Cited By

0.51

FWCI (Field Weighted Citation Impact)

Refs

0.69

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Reinforcement Learning in Robotics

Physical Sciences → Computer Science → Artificial Intelligence

Adaptive Dynamic Programming Control

Physical Sciences → Computer Science → Computational Theory and Mathematics

Supply Chain and Inventory Management

Social Sciences → Business, Management and Accounting → Management Information Systems

Online Reinforcement Learning in Markov Decision Process Using Linear Programming

Abstract

Metrics

Citation History

Topics

Related Documents

Cooperative retransmissions using Markov decision process with reinforcement learning

Bandit, Markov decision process and reinforcement learning

Reinforcement Learning: Discrete-Time Markov Decision Process

Comparison of optimized Markov Decision Process using Dynamic Programming and Temporal Differencing – A reinforcement learning approach

Reinforcement Learning to Rank with Markov Decision Process