JOURNAL ARTICLE

Temporal-Logic-Based Reward Shaping for Continuing Reinforcement Learning Tasks

Yuqian JiangSuda BharadwajBo WuRishi ShahUfuk TopcuPeter Stone

Year: 2021 Journal:   Texas Digital Library (University of Texas) Vol: 35 (9)Pages: 7995-8003   Publisher: The University of Texas at Austin

Abstract

In continuing tasks, average-reward reinforcement learning may be a more appropriate problem formulation than the more common discounted reward formulation. As usual, learning an optimal policy in this setting typically requires a large amount of training experiences. Reward shaping is a common approach for incorporating domain knowledge into reinforcement learning in order to speed up convergence to an optimal policy. However, to the best of our knowledge, the theoretical properties of reward shaping have thus far only been established in the discounted setting. This paper presents the first reward shaping framework for average-reward learning and proves that, under standard assumptions, the optimal policy under the original reward function can be recovered. In order to avoid the need for manual construction of the shaping function, we introduce a method for utilizing domain knowledge expressed as a temporal logic formula. The formula is automatically translated to a shaping function that provides additional reward throughout the learning process. We evaluate the proposed method on three continuing tasks. In all cases, shaping speeds up the average-reward learning rate without any reduction in the performance of the learned policy compared to relevant baselines.

Keywords:
Reinforcement learning Computer science Function (biology) Process (computing) Convergence (economics) Artificial intelligence Temporal difference learning Domain (mathematical analysis) Reward system Order (exchange) Machine learning Mathematics Psychology

Metrics

1
Cited By
0.14
FWCI (Field Weighted Citation Impact)
20
Refs
0.52
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Reinforcement Learning in Robotics
Physical Sciences →  Computer Science →  Artificial Intelligence
Formal Methods in Verification
Physical Sciences →  Computer Science →  Computational Theory and Mathematics
Robot Manipulation and Learning
Physical Sciences →  Engineering →  Control and Systems Engineering

Related Documents

JOURNAL ARTICLE

Temporal-Logic-Based Reward Shaping for Continuing Reinforcement Learning Tasks

Yuqian JiangSuda BharadwajBo WuRishi ShahUfuk TopcuPeter Stone

Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Year: 2021 Vol: 35 (9)Pages: 7995-8003
JOURNAL ARTICLE

Funnel-Based Reward Shaping for Signal Temporal Logic Tasks in Reinforcement Learning

Naman SaxenaSandeep GorantlaPushpak Jagtap

Journal:   IEEE Robotics and Automation Letters Year: 2023 Vol: 9 (2)Pages: 1373-1379
JOURNAL ARTICLE

Reward Shaping Based Federated Reinforcement Learning

Yiqiu HuYun HuaWenyan LiuJun Zhu

Journal:   IEEE Access Year: 2021 Vol: 9 Pages: 67259-67267
JOURNAL ARTICLE

Reward Shaping for Model-Based Bayesian Reinforcement Learning

Hyeoneun KimWoosang LimKanghoon LeeYung‐Kyun NohKee-Eung Kim

Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Year: 2015 Vol: 29 (1)
© 2026 ScienceGate Book Chapters — All rights reserved.