JOURNAL ARTICLE

Sample Efficient Reinforcement Learning with Double Importance Sampling Weight Clipping

Abstract

Proximal Policy Optimization (PPO) is a stable on-policy policy gradient (PG) method thanks to its clipped importance sampling (IS) weight objective of policy improvement. However, on-policy PG methods usually suffer from poor sample efficiency. In contrast, off-policy methods have demonstrated better sample efficiency by making more effective use of all collected samples during training. In this work, we aim to develop methods that inherit both the stability of on-policy PG methods and the data efficiency of off-policy methods. To this end, we present GeDISC, an off-policy algorithm that improves sample efficiency by reusing off-policy samples drawn from prior policies. Besides, we propose double IS weight clipping to control the high instability caused by off-policy data. We take the recently proposed generalized clipping mechanism for off-policy data as the first clipping to bound the policy update from the current policy and meanwhile we extend the standard clipping mechanism in PPO as the second clipping to prevent high variance and bias brought by extremely old samples. Extensive experiments on continuous and discrete control tasks show that the proposed new algorithm outperforms PPO and other SOTA PPO-based off-policy algorithms.

Keywords:
Clipping (morphology) Computer science Reinforcement learning Stability (learning theory) Sample (material) Variance (accounting) Sampling (signal processing) Control (management) Algorithm Artificial intelligence Machine learning Telecommunications Economics

Metrics

3
Cited By
0.77
FWCI (Field Weighted Citation Impact)
26
Refs
0.73
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Reinforcement Learning in Robotics
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Battery Technologies Research
Physical Sciences →  Engineering →  Automotive Engineering
Electric Vehicles and Infrastructure
Physical Sciences →  Engineering →  Electrical and Electronic Engineering

Related Documents

JOURNAL ARTICLE

Dimension-Wise Importance Sampling Weight Clipping for Sample-Efficient Reinforcement Learning

Seungyul HanYoungchul Sung

Journal:   arXiv (Cornell University) Year: 2019 Pages: 2586-2595
JOURNAL ARTICLE

Sample Efficient Reinforcement Learning with REINFORCE

Junzi ZhangJongho KimBrendan O’DonoghueStephen Boyd

Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Year: 2021 Vol: 35 (12)Pages: 10887-10895
JOURNAL ARTICLE

Sample-Efficient Reinforcement Learning From Human Feedback via Information-Directed Sampling

Qi HanHaochen YangQiaosheng ZhangZhuoran Yang

Journal:   IEEE Transactions on Information Theory Year: 2025 Vol: 71 (10)Pages: 7942-7958
DISSERTATION

Building sample-efficient reinforcement learning

Schwarzer, Max Allen

University:   Papyrus : Institutional Repository (Université de Montréal) Year: 2023
© 2026 ScienceGate Book Chapters — All rights reserved.