Offline–Online Actor–Critic

Xuesong Wang; Diyuan Hou; Longyang Huang; Yuhu Cheng

doi:10.1109/tai.2022.3225251

ScienceGate Book Chapters

JOURNAL ARTICLE

Offline–Online Actor–Critic

Xuesong Wang Diyuan Hou Longyang Huang Yuhu Cheng

Year: 2022 Journal: IEEE Transactions on Artificial Intelligence Vol: 5 (1)Pages: 61-69 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/tai.2022.3225251

Get Full-Text PDF Get Analytical Report

Abstract

Offline–online reinforcement learning (RL) can effectively address the problem of missing data (commonly known as transition) in offline RL. However, due to the effect of distribution shift, the performance of policy may degrade when an agent moves from offline to online training phases. In this article, we first analyze the problems of distribution shift and policy performance degradation in offline–online RL. Then, in order to alleviate these problems, we propose a novel RL algorithm offline–online actor–critic (O2AC) algorithm. In O2AC, a behavior clone constraint term is introduced into the policy objective function to address the distribution shift in offline training phase. In addition, in online training phase, the influence of the behavior clone constraint term is gradually reduced, which alleviates the policy performance degradation. Experiments show that O2AC outperforms existing offline–online RL algorithms.

Keywords:

Computer science Internet privacy

Metrics

Cited By

0.20

FWCI (Field Weighted Citation Impact)

Refs

0.55

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Reinforcement Learning in Robotics

Physical Sciences → Computer Science → Artificial Intelligence

Digital Games and Media

Social Sciences → Social Sciences → Sociology and Political Science

Artificial Intelligence in Games

Physical Sciences → Computer Science → Artificial Intelligence

Offline–Online Actor–Critic

Abstract

Metrics

Citation History

Topics

Related Documents

Offline-Online Actor-Critic for Partially Observable Markov Decision Process

Mild Policy Evaluation for Offline Actor–Critic

Dual Behavior Regularized Offline Deterministic Actor–Critic

Offline Robustness of Distributional Actor-Critic Ensemble Reinforcement Learning

Robust Offline Actor-Critic with On-Policy Regularized Policy Evaluation