Xuchong ZhangC MinYijie JiaLiming ChenJingmin ZhangHongbin Sun
The integration of semantic information can effectively enhance the performance of 3D object detection based on lidar point cloud. Most of previous researches utilize camera-lidar fusion to improve detection accuracy for distant or small objects. However, this approach is typically unsuitable for real-time applications due to the large amount of input data. Recently, a multi-task framework using only Iidar has emerged as an alternative that employs the same feature extraction backbone with different heads to simultaneously output detection and semantic segmentation results for lidar point clouds. Nonetheless, some previous works have failed to achieve an optimal balance between accuracy and speed. To address this issue, we propose a multi-task framework which leverages the Cartesian pillar and a multi-scale semantic segmentation head to overcome the shortcomings of existing works and improve the detection accuracy. We evaluate the proposed method using typical pillar-based and voxel-based detection models on the nuScenes dataset. The experimental results demonstrate that the proposed design achieves better performance especially on small objects, compared to single-task models. Moreover, the proposed network increases mAP and NDS by 3.1 % and 2.5 % respectively on the nuScenes test set, compared to the representative multi-task network.
Xu ZhangFang TianJiaxing SunYan Liu
Jiachen WangJiaqi FanJiahua XueXu BaiJunbiao Diao
Ozan UnalLuc Van GoolDengxin Dai
Yiming ZhaoLin BaiXinming Huang
Hemlata AryaParul SaxenaJaimala Jha