Predicting multi-agent trajectories in complex traffic scenes is essential for autonomous driving. It is a challenge to accurately predict the trajectories due to the unobservable intentions of agents, the constraints of the environmental context, and the potential interactions among multiple agents. In this paper, we propose a novel collaborative trajectory prediction model to address these challenges. Different from existing approaches that understand scene contexts with single global maps, we propose a hierarchical map encoding method that utilizes a vision transformer to learn both global and local map information, providing guidance for generating trajectories. In addition, we incorporate an attention mechanism to capture the spatial-temporal dynamics of agents, and a graph convolutional network to model the collaborative interactions among agents in the scene. The proposed approach has been evaluated on the Argoverse public dataset. Experimental results have demonstrated that our model achieves better accuracy on prediction as compared with models based on single rasterized or vectorized global maps.
Yuanchen ZhuShuaiqi FuYong WangYanan ZhaoHuachun Tan
Shihan TianNianwen NingWei LiLin ChenYanyu ZhangYi Zhou
Weirong LiuY. N. WangHongjiang HeZhiguo ZhangQing XiaoFu Jiang