Yingyan LiLue FanYang LiuZehao HuangYuntao ChenNaiyan WangZhaoxiang Zhang
Currently prevalent multi-modal 3D detection methods rely on dense detectors that usually use dense Bird's-Eye-View (BEV) feature maps. However, the cost of such BEV feature maps is quadratic to the detection range, making it not scalable for long-range detection. Recently, LiDAR-only fully sparse architecture has been gaining attention for its high efficiency in long-range perception. In this paper, we study how to develop a multi-modal fully sparse detector. Specifically, our proposed detector integrates the well-studied 2D instance segmentation into the LiDAR side, which is parallel to the 3D instance segmentation part in the LiDAR-only baseline. The proposed instance-based fusion framework maintains full sparsity while overcoming the constraints associated with the LiDAR-only fully sparse detector. Our framework showcases state-of-the-art performance on the widely used nuScenes dataset, Waymo Open Dataset, and the long-range Argoverse 2 dataset. Notably, the inference speed of our proposed method under the long-range perception setting is 2.7× faster than that of other state-of-the-art multimodal 3D detection methods.
Yulu GaoChonghao SimaShaoshuai ShiShangzhe DiSi LiuHongyang Li
Jianchao HouHong SongJ. W. LiYucong LinTianyu HuangJiangbiao HeXiaowei HeJian Yang
Yunshuang YuanYan XiaDaniel CremersMonika Sester
Shuai LiuMingyue CuiBoyang LiQuanmin LiangTinghe HongKai HuangYunxiao ShanKai Huang
Gopi Krishna ErabatiHélder Araújo