MMSDF: multimodal sparse dense fusion for 3D object detection

Yunfei Zhang; Feipeng Da; Shaoyan Gai

doi:10.1364/ao.565799

ScienceGate Book Chapters

JOURNAL ARTICLE

MMSDF: multimodal sparse dense fusion for 3D object detection

Yunfei Zhang Feipeng Da Shaoyan Gai

Year: 2025 Journal: Applied Optics Vol: 64 (30)Pages: F13-F13 Publisher: Optica Publishing Group

DOI: 10.1364/ao.565799

Get Full-Text PDF Get Analytical Report

Abstract

A high-precision 3D object detection in autonomous driving requires effective LiDAR-camera fusion. However, the heterogeneous nature of these modalities makes it challenging to fully integrate geometric and semantic information. Existing methods adopt either sparse or dense fusion: sparse fusion retains geometric accuracy but lacks semantic richness, while dense fusion offers better semantics but suffers from inefficiency and noise sensitivity. To address this, we propose the multimodal sparse dense fusion (MMSDF), a complementary framework that combines both fusion strategies. It includes (1) a sparse fusion attention (SFA) module that projects non-empty LiDAR voxels onto the image plane to extract local semantic features; (2) a dense bird’s eye view (BEV) feature alignment (BFA) module using optical flow and frequency-domain convolutions to align LiDAR and image BEV features; and (3) a roI point-voxel fusion attention (RPVFA) module that enhances roI representations via cross-attention between point and multiscale voxel features. Experiments on KITTI show that MMSDF achieves 88.21% and 84.26% accuracy on validation and test sets, respectively, with ablation studies confirming the effectiveness of each module.

Keywords:

Optics Fusion Computer science Artificial intelligence Computer vision Materials science Physics

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.22

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Advanced Neural Network Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Image Processing and 3D Reconstruction

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

MMSDF: multimodal sparse dense fusion for 3D object detection

Abstract

Metrics

Topics

Related Documents

Sparse Dense Fusion for 3D Object Detection

Multimodal Sparse Features for Object Detection

SDVRF: Sparse-to-Dense Voxel Region Fusion for Multi-Modal 3D Object Detection

Dense Voxel Fusion for 3D Object Detection

Dense projection fusion for 3D object detection