JOURNAL ARTICLE

BEVTransFusion: LiDAR-Camera Fusion Under Bird’s-Eye-View for 3D Object Detection with Transformers

Abstract

Recently, there is growing research interest in extracting Bird's-Eye-View (BEV) features from images and LiDAR to improve 3D object detection. However, existing methods mainly combine the features mechanically, which limits the utilization of BEV features. To address this limitation, we draw inspiration from TransFusion and design a two-layer transformer decoder to fuse LiDAR and camera BEV features. By doing so, we can omit the steps of reference point backprojection and feature sampling, which results in better correlation between the fused LiDAR and image features and higher robustness to the calibration matrix. Furthermore, we add 3D position encoding to the BEV features to compensate for the lack of height information. We also propose an length-width-height modulated attention mechanism to incorporate scale information. We also perform comprehensive experiments to verify the effectiveness of our methods.

Keywords:
Lidar Computer vision Artificial intelligence Computer science Transformer Object detection Fusion Remote sensing Engineering Geography Pattern recognition (psychology) Electrical engineering Voltage

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
48
Refs
0.18
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Industrial Vision Systems and Defect Detection
Physical Sciences →  Engineering →  Industrial and Manufacturing Engineering
Robotics and Sensor-Based Localization
Physical Sciences →  Engineering →  Aerospace Engineering
© 2026 ScienceGate Book Chapters — All rights reserved.