Relative importance sampling for off-policy actor-critic in deep reinforcement learning

Mahammad Humayoo; Gengzhong Zheng; Xiaoqing Dong; Liming Miao; Shuwei Qiu; Zexun Zhou; Peitao Wang; Zakir Ullah; Naveed Ur Rehman Junejo; Xueqi Cheng

doi:10.1038/s41598-025-96201-5

JOURNAL ARTICLE

Relative importance sampling for off-policy actor-critic in deep reinforcement learning

Mahammad Humayoo Gengzhong Zheng Xiaoqing Dong Liming Miao Shuwei Qiu Zexun Zhou Peitao Wang Zakir Ullah Naveed Ur Rehman Junejo Xueqi Cheng

Year: 2025 Journal: Scientific Reports Vol: 15 (1)Pages: 14349-14349 Publisher: Nature Portfolio

DOI: 10.1038/s41598-025-96201-5

Get Full-Text PDF Get Analytical Report

Abstract

Off-policy learning exhibits greater instability when compared to on-policy learning in reinforcement learning (RL). The difference in probability distribution between the target policy (π) and the behavior policy (b) is a major cause of instability. High variance also originates from distributional mismatch. The variation between the target policy's distribution and the behavior policy's distribution can be reduced using importance sampling (IS). However, importance sampling has high variance, which is exacerbated in sequential scenarios. We propose a smooth form of importance sampling, specifically relative importance sampling (RIS), which mitigates variance and stabilizes learning. To control variance, we alter the value of the smoothness parameter [Formula: see text] in RIS. We develop the first model-free relative importance sampling off-policy actor-critic (RIS-off-PAC) algorithms in RL using this strategy. Our method uses a network to generate the target policy (actor) and evaluate the current policy (π) using a value function (critic) based on behavior policy samples. Our algorithms are trained using behavior policy action values in the reward function, not target policy ones. Both the actor and critic are trained using deep neural networks. Our methods performed better than or equal to several state-of-the-art RL benchmarks on OpenAI Gym challenges and synthetic datasets.

Keywords:

Reinforcement learning Computer science Sampling (signal processing) Artificial intelligence Reinforcement Policy learning Data science Machine learning Psychology Social psychology Telecommunications

Metrics

Cited By

4.82

FWCI (Field Weighted Citation Impact)

Refs

0.92

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Reinforcement Learning in Robotics

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Bandit Algorithms Research

Social Sciences → Decision Sciences → Management Science and Operations Research

Adaptive Dynamic Programming Control

Physical Sciences → Computer Science → Computational Theory and Mathematics

Relative importance sampling for off-policy actor-critic in deep reinforcement learning

Abstract

Metrics

Citation History

Topics

Related Documents

Noisy Importance Sampling Actor-Critic: An Off-Policy Actor-Critic With Experience Replay

An efficient and lightweight off-policy actor–critic reinforcement learning framework

Distributed On-Policy Actor-Critic Reinforcement Learning

Integrated Actor-Critic for Deep Reinforcement Learning

Off-policy actor-critic deep reinforcement learning methods for alert prioritization in intrusion detection systems