Offline-Online Actor-Critic for Partially Observable Markov Decision Process

Xuesong Wang; Hou Diyuan; Yuhu Cheng

doi:10.1109/csis-iac60628.2023.10364052

ScienceGate Book Chapters

JOURNAL ARTICLE

Offline-Online Actor-Critic for Partially Observable Markov Decision Process

Xuesong Wang Hou Diyuan Yuhu Cheng

Year: 2023 Vol: 35 Pages: 415-420

DOI: 10.1109/csis-iac60628.2023.10364052

Get Full-Text PDF Get Analytical Report

Abstract

Offline-online reinforcement learning combines the advantages of offline learning in terms of sample efficiency and online learning in terms of exploration and trial-and-error. On the other hand, the partial observability of the environment is a challenging issue in reinforcement learning. This paper proposes a Partially Observable Offline-Online Actor-Critic (PO3AC) algorithm. In PO3AC, during both the offline and online training phases, different weighted behavior cloning regularization terms are considered to alleviate the distribution shift. During training, the actor and critic are trained separately, and a recurrent neural network is used to train a state prediction model that predicts the mapping of observation to belief states. Experimental results conducted on the PyBullet physics engine demonstrate that the proposed algorithm outperforms existing algorithms.

Keywords:

Observability Reinforcement learning Partially observable Markov decision process Computer science Artificial intelligence Machine learning Regularization (linguistics) Offline learning Online learning Markov decision process Online and offline Observable Artificial neural network Process (computing) Markov process Markov chain Markov model Mathematics

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.19

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Reinforcement Learning in Robotics

Physical Sciences → Computer Science → Artificial Intelligence

Adversarial Robustness in Machine Learning

Physical Sciences → Computer Science → Artificial Intelligence

Data Stream Mining Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Offline-Online Actor-Critic for Partially Observable Markov Decision Process

Abstract

Metrics

Topics

Related Documents

Consolidated actor–critic model for partially-observable Markov decision processes

Offline–Online Actor–Critic

A reactive power optimization partially observable Markov decision process with data uncertainty using multi-agent actor-attention-critic algorithm

Guided Soft Actor Critic: A Guided Deep Reinforcement Learning Approach for Partially Observable Markov Decision Processes

Partially Observable Markov Decision Processes