Xiaona SongHaonan ZhangHaichao LiuXinxin WangLijun Wang
In recent years, multi-modal 3D object detection algorithms have experienced significant development. However, current algorithms primarily focus on designing overall fusion strategies for multi-modal features, neglecting finer-grained representations, which leads to a decline in the detection accuracy of small objects. To address this issue, this paper proposes the Instance-aware Fine-grained feature Enhancement Cross Modal Transformer (IFE-CMT) model. We designed an Instance feature Enhancement Module (IE-Module), which can accurately extract object features from multi-modal data and use them to enhance overall features while avoiding view transformations and maintaining low computational overhead. Additionally, we design a new point cloud branch network that effectively expands the network’s receptive field, enhancing the model’s semantic expression capabilities while preserving texture details of the objects. Experimental results on the nuScenes dataset demonstrate that compared to the CMT model, our proposed IFE-CMT model improves mAP and NDS by 2.1% and 0.8% on the validation set, respectively. On the test set, it improves mAP and NDS by 1.9% and a 0.7%. Notably, for small object categories such as bicycles and motorcycles, the mAP improved by 6.6% and 3.7%, respectively, significantly enhancing the detection accuracy of small objects.
Feiyue ZhaoJianwei ZhangGuoqing Zhang
Wei LiKuan ZhuHaiyun GuoHonghui DongJinqiao Wang
Pardha DevakiP Naga VineethaC. Kishor Kumar ReddyP.T. BharathiKarimulla ShaikSushant KumarG.-S XiaK LiG WanG ChengL MengJ HanK ZhouZ ZhangC GaoJ LiuJ DingN XueY LongG.-S XiaQ LuX XieG ChengJ WangX YaoJ HanZ HeD HeA BeheraZ WhartonP HewageA BeraD KorschP BodesheimJ DenzlerY LiY ZhangX HuangA Yuille
Yong ZhouSifan WangJiaqi ZhaoHancheng ZhuRui Yao
Gaowen LiuHuan LiuCaixia YanYuyang GuoRui LiSizhe Dang