Height-Adaptive Deformable Multi-Modal Fusion for 3D Object Detection

Jiahao Li; Lingshan Chen; Zhen Li

doi:10.1109/access.2025.3553372

ScienceGate Book Chapters

JOURNAL ARTICLE

Height-Adaptive Deformable Multi-Modal Fusion for 3D Object Detection

Jiahao Li Lingshan Chen Zhen Li

Year: 2025 Journal: IEEE Access Vol: 13 Pages: 52385-52396 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/access.2025.3553372

Get Full-Text PDF Get Analytical Report

Abstract

LiDAR-Camera fusion has demonstrated remarkable potential in 3D object detection for autonomous vehicles, leveraging complementary information from both modalities. Recent state-of-the-art approaches primarily make use of projection matrices to achieve cross-modal data alignment. However, these methods often struggle with poor performance when faced with sensor misalignment or calibration errors, resulting in suboptimal fusion quality and limited robustness. In this paper, we propose a novel framework for 3D object detection, called Height-Adaptive Deformable Multi-Modal Fusion, which leverages Deformable Attention to enhance the fusion process. Specifically, we introduce a Deformable-based Cross-Modal Spatial Attention that dynamically fuse image features through learnable offsets, allowing for more flexible and precise alignment between the LiDAR and camera modalities. To further improve the fusion quality, we design a Height-Adaptive Aggregation strategy that mitigates the risk of incorrect fusion from background points while emphasizing the aggregation of foreground object features. In addition, we introduce projection noise to simulate misalign scenarios. To tackle these issues, an extra supervision loss is added. Extensive experiments on the nuScenes benchmark demonstrate the effectiveness and robustness of our proposed framework. Specifically, our methods significantly outperforms the LiDAR-only method and exhibits reduced precision degradation under sensor misalignment, outperforming other fusion-based approaches. Our results validate the potential of proposed framework for improving 3D object detection accuracy, particularly in real-world, imperfect sensor environments.

Keywords:

Modal Computer science Fusion Computer vision Object detection Sensor fusion Artificial intelligence Segmentation

Metrics

Cited By

3.61

FWCI (Field Weighted Citation Impact)

Refs

0.82

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Industrial Vision Systems and Defect Detection

Physical Sciences → Engineering → Industrial and Manufacturing Engineering

3D Surveying and Cultural Heritage

Physical Sciences → Earth and Planetary Sciences → Geology

Robotics and Sensor-Based Localization

Physical Sciences → Engineering → Aerospace Engineering

Height-Adaptive Deformable Multi-Modal Fusion for 3D Object Detection

Abstract

Metrics

Citation History

Topics

Related Documents

Deformable Feature Fusion Network for Multi-Modal 3D Object Detection

Dual-domain deformable feature fusion for multi-modal 3D object detection

Object detection based on multi-modal adaptive fusion using YOLOv3

Learning Adaptive Fusion Bank for Multi-Modal Salient Object Detection

Multi-modal Feature Fusion 3D Object Detection