The uncertainty and complexity of autonomous driving make Deep Reinforcement Learning (DRL) appealing. DRL can optimize the expected reward by interacting with environments. However, DRL has no guarantee of safety, which is essential for autonomous driving. Besides, the reward shaping for DRL in autonomous driving is also challenging. In this work, we introduced a policy-guided trajectory planner and proposed a hierarchical structure to try to solve these problems. The high-level DRL agent's output is policies that are considered as suggestions for the low-level policy-guided trajectory planner. To achieve the guarantee of safety, we first translated the traffic rules and policies into formal specifications. Notice that, in many cases, the policy may not be possible to be fully applied. Given certain formal specifications, we constructed a minimum-violation motion planning problem for the low-level policy-guided trajectory planner. Through this hierarchical structure, the long-term uncertainty is handled by the high-level DRL agent, and the safety is guaranteed by the low-level planner. Furthermore, the reward includes not only the violation of traffic rules but also the violation of policies. By adding the violation of policies returned by the low-level planner to the reward, the agent could explicitly learn if a policy is supported by environments and vehicle dynamics. Our method was proven to be effective through numerical experiments.
Kanghoon LeeJiachen LiDavid IseleJinkyoo ParkKikuo FujimuraMykel J. Kochendorfer
Chengyu WangShuang ZhouLuhan WangZhaoming LuCelimuge WuXiangming WenGuochu Shou
Gaku MinamotoToshimitsu KANEKONoriyuki HIRAYAMA