ObjectFusion: Multi-modal 3D Object Detection with Object-Centric Fusion

Qi Cai; Yingwei Pan; Ting Yao; Chong‐Wah Ngo; Tao Mei

doi:10.1109/iccv51070.2023.01656

ScienceGate Book Chapters

JOURNAL ARTICLE

ObjectFusion: Multi-modal 3D Object Detection with Object-Centric Fusion

Qi Cai Yingwei Pan Ting Yao Chong‐Wah Ngo Tao Mei

Year: 2023 Pages: 18021-18030

DOI: 10.1109/iccv51070.2023.01656

Get Full-Text PDF Get Analytical Report

Abstract

Recent progress on multi-modal 3D object detection has featured BEV (Bird-Eye-View) based fusion, which effectively unifies both LiDAR point clouds and camera images in a shared BEV space. Nevertheless, it is not trivial to perform camera-to-BEV transformation due to the inherently ambiguous depth estimation of each pixel, resulting in spatial misalignment between these two multi-modal features. Moreover, such transformation also inevitably leads to projection distortion of camera image features in BEV space. In this paper, we propose a novel Object-centric Fusion (ObjectFusion) paradigm, which completely gets rid of camera-to-BEV transformation during fusion to align object-centric features across different modalities for 3D object detection. ObjectFusion first learns three kinds of modality-specific feature maps (i.e., voxel, BEV, and image features) from LiDAR point clouds and its BEV projections, camera images. Then a set of 3D object proposals are produced from the BEV features via a heatmap-based proposal generator. Next, the 3D object proposals are reprojected back to voxel, BEV, and image spaces. We leverage voxel and RoI pooling to generate spatially aligned object-centric features for each modality. All the object-centric features of three modalities are further fused at object level, which is finally fed into the detection heads. Extensive experiments on nuScenes dataset demonstrate the superiority of our ObjectFusion, by achieving 69.8% mAP on nuScenes validation set and improving BEVFusion by 1.3%.

Keywords:

Computer vision Artificial intelligence Computer science Voxel Object detection Object (grammar) Image fusion Pixel Transformation (genetics) Feature (linguistics) Pattern recognition (psychology) Image (mathematics)

Metrics

Cited By

6.73

FWCI (Field Weighted Citation Impact)

Refs

0.97

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Neural Network Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Robotics and Sensor-Based Localization

Physical Sciences → Engineering → Aerospace Engineering

Visual Attention and Saliency Detection

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

ObjectFusion: Multi-modal 3D Object Detection with Object-Centric Fusion

Abstract

Metrics

Citation History

Topics

Related Documents

Monocular Dynamic Object Detection with multi-modal fusion

FocalFusion: An object-centric temporal fusion framework for multi-modal 3D detection

Multi-modal Feature Fusion 3D Object Detection

MLF3D: Multi-Level Fusion for Multi-Modal 3D Object Detection

ObjectFusion: Accurate object-level SLAM with neural object priors