Ensuring robots can move safely and adhere to social norms in dynamic human environments is a crucial step towards robot autonomous decision-making. In existing work, double serial separate modules are generally used to capture spatial and temporal interactions, respectively. However, such methods lead to extra difficulties in improving the utilization of spatio-temporal features and reducing the conservatism of navigation policy. In light of this, this paper proposes a spatiotemporal transformer-based policy optimization algorithm to more effectively preserve the human-robot interactions. Specifically, a gated embedding mechanism is introduced to effectively fuses the spatial and temporal representations by integrating both modalities at the feature level. Then Transformer is leveraged to encode the spatio-temporal semantic information, with the hope of finding the optimal navigation policy. Finally, a combination of spatio-temporal Transformer and self-adjusting policy entropy significantly reduce the conservatism of navigation policies. Experimental results demonstrate the priority of the proposed algorithm over the state-of-the-art methods.
Shuhui YangBin LiYiming XuZixin HaoMingliang Zhang
Changan ChenYuejiang LiuS. KreissAlexandre Alahi
Sunil Srivatsav SamsaniHusna MutahiraMannan Saeed Muhammad
Jiamin ShiZhuo QiuTangyike ZhangShitao ChenJingmin XinNanning Zheng