Baowen ZhangChengzhi SuGuohua Cao
In the projection-driven multi-modal 3D object detection task, the data projection process has extremely high computational complexity, which restricts the efficiency of the detection network. In addition, traditional projection interpolation methods have certain limitations. To improve the voxel projection efficiency and explore a projection interpolation method that can enhance the detection accuracy, we propose a voxel projection optimized positioning strategy and an independent projection interpolation method—neighborhood-enhanced feature interpolation. Meanwhile, we propose a new 3D object detection network, S-FusionNet, based on multi-modal semi-fusion. Through the optimized positioning strategy, the inference speed increases from 6.7 FPS to 10.78 FPS. Using the optimized positioning strategy, with an additional 6.1 ms consumed by the network, the neighborhood-enhanced feature interpolation method improves the detection accuracy of “pedestrians” at the “moderate” and “hard” levels by 2.18% and 2.25%, respectively. It also improves the detection accuracy of “Car” and “Cyclist” at the “moderate” level by 1.36% and 1.3%, respectively. We also verify the stability and generalization ability of the proposed semi-fusion network S-FusionNet through robustness experiments.
Liangyu ZuoYaochen LiMengtao HanQiao LiYuehu Liu
Yu SunGuangqi LiuXiaohui HanWenbo ZuoWeihua Liu
Nan HuHuimin MaChao LeXuehui Shao
Kun GuoT. Eng GanZhao DingQiang Ling
Yiwen JinRong ZhangYisu HuHongliang LuoYongqiang Bai