JOURNAL ARTICLE

Offline-Online Actor-Critic for Partially Observable Markov Decision Process

Abstract

Offline-online reinforcement learning combines the advantages of offline learning in terms of sample efficiency and online learning in terms of exploration and trial-and-error. On the other hand, the partial observability of the environment is a challenging issue in reinforcement learning. This paper proposes a Partially Observable Offline-Online Actor-Critic (PO3AC) algorithm. In PO3AC, during both the offline and online training phases, different weighted behavior cloning regularization terms are considered to alleviate the distribution shift. During training, the actor and critic are trained separately, and a recurrent neural network is used to train a state prediction model that predicts the mapping of observation to belief states. Experimental results conducted on the PyBullet physics engine demonstrate that the proposed algorithm outperforms existing algorithms.

Keywords:
Observability Reinforcement learning Partially observable Markov decision process Computer science Artificial intelligence Machine learning Regularization (linguistics) Offline learning Online learning Markov decision process Online and offline Observable Artificial neural network Process (computing) Markov process Markov chain Markov model Mathematics

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
20
Refs
0.19
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Reinforcement Learning in Robotics
Physical Sciences →  Computer Science →  Artificial Intelligence
Adversarial Robustness in Machine Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
Data Stream Mining Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Offline–Online Actor–Critic

Xuesong WangDiyuan HouLongyang HuangYuhu Cheng

Journal:   IEEE Transactions on Artificial Intelligence Year: 2022 Vol: 5 (1)Pages: 61-69
JOURNAL ARTICLE

A reactive power optimization partially observable Markov decision process with data uncertainty using multi-agent actor-attention-critic algorithm

Yaru GuXueliang Huang

Journal:   International Journal of Electrical Power & Energy Systems Year: 2022 Vol: 147 Pages: 108848-108848
BOOK-CHAPTER

Partially Observable Markov Decision Processes

Luis Enrique Sucar

Advances in computer vision and pattern recognition Year: 2020 Pages: 249-266
© 2026 ScienceGate Book Chapters — All rights reserved.