Sample Efficient Reinforcement Learning with Double Importance Sampling Weight Clipping

Jiale Han; Mingxiao Feng; Wengang Zhou; Houqiang Li

doi:10.1109/cog57401.2023.10333147

ScienceGate Book Chapters

JOURNAL ARTICLE

Sample Efficient Reinforcement Learning with Double Importance Sampling Weight Clipping

Jiale Han Mingxiao Feng Wengang Zhou Houqiang Li

Year: 2023 Pages: 1-8

DOI: 10.1109/cog57401.2023.10333147

Get Full-Text PDF Get Analytical Report

Abstract

Proximal Policy Optimization (PPO) is a stable on-policy policy gradient (PG) method thanks to its clipped importance sampling (IS) weight objective of policy improvement. However, on-policy PG methods usually suffer from poor sample efficiency. In contrast, off-policy methods have demonstrated better sample efficiency by making more effective use of all collected samples during training. In this work, we aim to develop methods that inherit both the stability of on-policy PG methods and the data efficiency of off-policy methods. To this end, we present GeDISC, an off-policy algorithm that improves sample efficiency by reusing off-policy samples drawn from prior policies. Besides, we propose double IS weight clipping to control the high instability caused by off-policy data. We take the recently proposed generalized clipping mechanism for off-policy data as the first clipping to bound the policy update from the current policy and meanwhile we extend the standard clipping mechanism in PPO as the second clipping to prevent high variance and bias brought by extremely old samples. Extensive experiments on continuous and discrete control tasks show that the proposed new algorithm outperforms PPO and other SOTA PPO-based off-policy algorithms.

Keywords:

Clipping (morphology) Computer science Reinforcement learning Stability (learning theory) Sample (material) Variance (accounting) Sampling (signal processing) Control (management) Algorithm Artificial intelligence Machine learning Telecommunications Economics

Metrics

Cited By

0.77

FWCI (Field Weighted Citation Impact)

Refs

0.73

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Reinforcement Learning in Robotics

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Battery Technologies Research

Physical Sciences → Engineering → Automotive Engineering

Electric Vehicles and Infrastructure

Physical Sciences → Engineering → Electrical and Electronic Engineering

Sample Efficient Reinforcement Learning with Double Importance Sampling Weight Clipping

Abstract

Metrics

Citation History

Topics

Related Documents

Dimension-Wise Importance Sampling Weight Clipping for Sample-Efficient Reinforcement Learning

Sample Efficient Reinforcement Learning with REINFORCE

Competitive-Cooperative-Concurrent Reinforcement Learning with Importance Sampling

Sample-Efficient Reinforcement Learning From Human Feedback via Information-Directed Sampling

Building sample-efficient reinforcement learning