Jiandong ChenYuanyuan ZouShaoyuan Li
In this paper, a control synthesis problem based on reinforcement learning (RL) for continuous system under temporal logic tasks is studied. Since the systems are expected to satisfy diverse safety and liveness properties under temporal logic tasks, traditional RL algorithm does not work well due to the heuristic design of rewards and non-existent security guarantee. In this work, a novel framework is proposed to synthesize a safe and optimal controller based on RL for temporal logic tasks. First, signal temporal logic (STL) is adopted to formally describe the temporal logic tasks. Based on robust semantics of STL formula, a reference reward curve is designed to determine the reward according to the current instant and state. Then, the shield layer is designed by robust control barrier function which renders the system in a safe set when training the RL policy. In the proposed method, a time-dependent policy for STL tasks in continuous state space is achieved. We demonstrate that this approach both ensures safety and guides exploration effectively during training by several robot motion case studies.
Yousef EmamGennaro NotomistaPaul GlotfelterZsolt KiraMagnus Egerstedt
Lars LindemannDimos V. Dimarogonas
Maria CharitidouDimos V. Dimarogonas
Siqi WangShaoyuan LiLi YinXiang Yin