JOURNAL ARTICLE

HCPVF: Hierarchical Cascaded Point-Voxel Fusion for 3D Object Detection

Baojie FanKexin ZhangJiandong Tian

Year: 2023 Journal:   IEEE Transactions on Circuits and Systems for Video Technology Vol: 34 (10)Pages: 8997-9009   Publisher: Institute of Electrical and Electronics Engineers

Abstract

With the astonishing development of 3D sensors, point cloud based 3D object detection is attracting increasing attention from both industry and academia, and widely applied in various fields, such as robotics and autonomous driving. However, how to balance the 3D object detecting accuracy and speed is still a challenging problem. In this paper, we study this issue and propose a novel and effective 3D point cloudy object detection network based on hierarchical cascaded point-voxel fusion, called HCPVF. Firstly, a novel bird's-eye-view(BEV) attention mechanism with linear complexity is developed to improve point cloud feature backbone network, which can be implemented easily to mine the point-to-point similarity in BEV's view, by two cascaded linear layers and two normalization layers. This operation captures long-range dependencies and reduces the uneven sampling of sparse BEV features, making the extracted point cloudy features more discriminative. Secondly, the proposed HCPVF module is equipped with dual-level hierarchical cascaded detection head, including voxel level and the following point level. The voxel level is composed of coarse Region of interest(RoI) pooling and fine RoI pooling, which are cooperated to aggregate voxel features from different grid divisions and predict relatively coarse detection boxes. In the following, the point level is based on Key Points Transformer. It firstly encodes the spatial context information between the original point and the voxel level box. And then, a novel dual-weighted decoder is developed to enhance the context interaction by weighting the channel and spatial dimensions to obtain more accurate detection results. This design utilizes the voxel based method with high computational efficiency and the point based method with more complete spatial information, fusing low-level voxel features and high-level point features through hierarchical cascaded strategy. Extensive experiments demonstate that the proposed HCPVF achieves state-of-the-art 3D detection performance while maintaining computational efficiency on both the Waymo Open Dataset and the highly-competitive KITTI benchmark.

Keywords:
Computer science Artificial intelligence Voxel Point cloud Computer vision Pattern recognition (psychology) Normalization (sociology) Discriminative model Object detection Weighting

Metrics

21
Cited By
3.82
FWCI (Field Weighted Citation Impact)
54
Refs
0.92
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
3D Surveying and Cultural Heritage
Physical Sciences →  Earth and Planetary Sciences →  Geology
Remote Sensing and LiDAR Applications
Physical Sciences →  Environmental Science →  Environmental Engineering

Related Documents

BOOK-CHAPTER

Point-Voxel Fusion with Adaptive Sectorized Points Sampling for 3D Object Detection

Yihui LiuHe HongwenYingjuan Tang

Communications in computer and information science Year: 2026 Pages: 147-159
JOURNAL ARTICLE

Dense Voxel Fusion for 3D Object Detection

Anas MahmoudJordan S. K. HuSteven L. Waslander

Journal:   2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Year: 2023 Pages: 663-672
© 2026 ScienceGate Book Chapters — All rights reserved.