JOURNAL ARTICLE

Model-Based Reinforcement Learning via Proximal Policy Optimization

Abstract

Proximal policy optimization (PPO) is the state-of the-art most effective model-free reinforcement learning algorithm. Its powerful policy search ability allows an agent to find the optimal policy by trial and error but leads to high computation and low data-efficiency. Model-based algorithms can make the most efficient use of data by learning a forward model from observation, but face the challenge of model error. In this paper, we combine the strengths of both algorithms and introduce a data-efficient model-based approach called PIPPO (probabilistic inference via PPO). It makes online probabilistic dynamic model inference based on Gaussian process regression and executes offline policy improvement using PPO on the inferred model. Empirical evaluation on the pendulum benchmark problem shows that the proposed PIPPO algorithm has comparable performance and less interaction with the environment compared with traditional PPO.

Keywords:
Computer science Reinforcement learning Benchmark (surveying) Probabilistic logic Inference Gaussian process Machine learning Artificial intelligence Computation Gaussian Algorithm

Metrics

22
Cited By
1.23
FWCI (Field Weighted Citation Impact)
23
Refs
0.84
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Reinforcement Learning in Robotics
Physical Sciences →  Computer Science →  Artificial Intelligence
Gaussian Processes and Bayesian Inference
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Bandit Algorithms Research
Social Sciences →  Decision Sciences →  Management Science and Operations Research

Related Documents

JOURNAL ARTICLE

Model-based Ensemble Reinforcement Learning with Soft Proximal Policy Optimization

Dazi LiFuqiang Zhu

Journal:   2021 IEEE 10th Data Driven Control and Learning Systems Conference (DDCLS) Year: 2021 Pages: 1430-1435
JOURNAL ARTICLE

Policy Optimization in Reinforcement Learning: Proximal Policy Optimization

Saurugger, Bernd

Journal:   Zenodo (CERN European Organization for Nuclear Research) Year: 2023
JOURNAL ARTICLE

Policy Optimization in Reinforcement Learning: Proximal Policy Optimization

Saurugger, Bernd

Journal:   Zenodo (CERN European Organization for Nuclear Research) Year: 2023
JOURNAL ARTICLE

Economic dispatch based on reinforcement learning using proximal policy optimization

Cong ZhangJunjie HouXiaoxi LvPei Zhang

Journal:   IET conference proceedings. Year: 2025 Vol: 2024 (33)Pages: 453-460
© 2026 ScienceGate Book Chapters — All rights reserved.