Ruiqi MaChunwei WangChi ChenYihan ZengBijun LiQin ZouQingqiu HuangXinge ZhuHang Xu
Detecting accurate 3D bounding boxes from LiDAR point clouds is crucial for autonomous driving. Recent studies have shown the superiority of the performance of multi-frame 3D detectors, yet eliminating the misalignment across frames and effectively aggregating spatiotemporal information are still challenging problems. In this paper, we present a novel flow-guided feature aggregation scheme for 3D object detection (FIFA3D) to align cross-frame information. FIFA3D first leverages optical flow with supervised signals to model the pixel-to-pixel correlations between sequential frames. Considering the sparse nature of bird’s-eye-view feature maps, an additional classification branch is adopted to provide explicit pixel-wise clues. Meanwhile, we utilize multi-scale feature maps and predict flow in a coarse-to-fine manner. With guidance from the estimated flow, historical features can be well aligned to the current situation, and a cascade fusion strategy is introduced to benefit the following detection. Extensive experiments show that FIFA3D surpasses the single-frame baseline with remarkable margins of +10.8% mAPH and +6.8% mAP on the Waymo and nuScenes validation datasets and performs well compared with state-of-the-art methods.
Xizhou ZhuYujie WangJifeng DaiLu YuanYichen Wei
Khurram Azeem HashmiTalha Uddin SheikhDidier StrickerMuhammad Zeshan Afzal
Shishir MuralidharaKhurram Azeem HashmiAlain PaganiMarcus LiwickiDidier StrickerMuhammad Zeshan Afzal
Jun LiangHaosheng ChenYan YanYang LuHanzi Wang
Yunzhi ZhugeGang YangPingping ZhangHuchuan Lu