JOURNAL ARTICLE

Fusing Bird’s Eye View LIDAR Point Cloud and Front View Camera Image for 3D Object Detection

Abstract

We propose a new method for fusing LIDAR point cloud and camera-captured images in deep convolutional neural networks (CNN). The proposed method constructs a new layer called sparse non-homogeneous pooling layer to transform features between bird's eye view and front view. The sparse point cloud is used to construct the mapping between the two views. The pooling layer allows efficient fusion of the multi-view features at any stage of the network. This is favorable for 3D object detection using camera-LIDAR fusion for autonomous driving. A corresponding one-stage detector is designed and tested on the KITTI bird's eye view object detection dataset, which produces 3D bounding boxes from the bird's eye view map. The fusion method shows significant improvement on both speed and accuracy of the pedestrian detection over other fusion-based object detection networks.

Keywords:
Artificial intelligence Computer vision Computer science Lidar Object detection Point cloud Pooling Convolutional neural network Minimum bounding box Object (grammar) Pattern recognition (psychology) Remote sensing Image (mathematics) Geography

Metrics

80
Cited By
6.06
FWCI (Field Weighted Citation Impact)
37
Refs
0.96
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Robotics and Sensor-Based Localization
Physical Sciences →  Engineering →  Aerospace Engineering
Video Surveillance and Tracking Methods
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.