JOURNAL ARTICLE

CL-FusionBEV: A Cross-Attention Based Fusion Model for Camera and LiDAR in Bird’s Eye View Perception

Abstract

In autonomous navigation, the ability to detect 3D objects from a Bird’s-Eye View (BEV) perspective is essential. Nevertheless, many obstacles remain before LiDAR and camera data can be effectively combined. We propose CL-FusionBEV, a novel framework for sensor fusion that enhances Three-dimensional object recognition in the BEV domain. This method structures LiDAR point clouds for improved spatial feature extraction while converting camera data into BEV format via an implicit learning technique. An implicit fusion network and a multi-modal cross-attention mechanism facilitate seamless sensor interaction, ensuring comprehensive feature integration. Additionally, a self-attention mechanism of BEV enhances broad-scale reasoning and data extraction, improving the detection of occluded and distant objects. By efficiently synchronising data from several sensors, the suggested method improves feature uniformity and resolves spatial inconsistencies. It further leverages adaptive feature selection to enhance robustness against sensor noise and varying conditions. We evaluate CL-FusionBEV on the nuScenes dataset, achieving achieved a 73.3% mAP and a 75.5% NDS on the nuScenes benchmark, with vehicle and pedestrian detection accuracies of 89% and 90.7%, respectively. Our model demonstrates superior robustness in challenging conditions such as low visibility and dense urban environments. CL-FusionBEV maintains high efficiency with real-time inference, making it suitable for deployment in autonomous systems. Extensive experiments show our strategy routinely beats cutting-edge techniques, especially in detecting small and distant objects. By addressing key sensor fusion challenges in the BEV domain, CL-FusionBEV offers a notable advancement in Three-dimensional object recognition, ensuring high accuracy, efficiency, and reliability for real-world driving scenarios.

Keywords:
Perception Lidar Computer vision Artificial intelligence Fusion Computer science Geography Optometry Psychology Remote sensing Medicine Philosophy Neuroscience

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.02
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Infrared Target Detection Methodologies
Physical Sciences →  Engineering →  Aerospace Engineering
Advanced Image Fusion Techniques
Physical Sciences →  Engineering →  Media Technology
Visual Attention and Saliency Detection
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.