JOURNAL ARTICLE

Structured Reward Shaping using Signal Temporal Logic specifications

Abstract

Deep reinforcement learning has become a popular technique to train autonomous agents to learn control policies that enable them to accomplish complex tasks in uncertain environments. A key component of an RL algorithm is the definition of a reward function that maps each state and an action that can be taken in that state to some real-valued reward. Typically, reward functions informally capture an implicit (albeit vague) specification on the desired behavior of the agent. In this paper, we propose the use of the logical formalism of Signal Temporal Logic(STL) as a formal specification for the desired behaviors of the agent. Furthermore, we propose algorithms to locally shape rewards in each state with the goal of satisfying the high-level STL specification. We demonstrate our technique on two case studies, a cart-pole balancing problem with a discrete action space, and controlling the actuation of a simulated quadrotor for point-to-point movement.The proposed framework is agnostic to any specific RL algorithm, as locally shaped rewards can be easily used in concert with any deep RL algorithm.

Keywords:
Computer science Reinforcement learning Temporal logic Formalism (music) Artificial intelligence State (computer science) Function (biology) Point (geometry) State space Theoretical computer science Algorithm Mathematics

Metrics

50
Cited By
5.14
FWCI (Field Weighted Citation Impact)
43
Refs
0.95
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Formal Methods in Verification
Physical Sciences →  Computer Science →  Computational Theory and Mathematics
Logic, Reasoning, and Knowledge
Physical Sciences →  Computer Science →  Artificial Intelligence
Reinforcement Learning in Robotics
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.