Adaptive Multiscale Attention Feature Aggregation for Multi‐Modal 3D Occluded Object Detection

Y. J. Han; Ming Yu; Jing Liu

doi:10.1049/cvi2.70035

ScienceGate Book Chapters

JOURNAL ARTICLE

Adaptive Multiscale Attention Feature Aggregation for Multi‐Modal 3D Occluded Object Detection

Y. J. Han Ming Yu Jing Liu

Year: 2025 Journal: IET Computer Vision Vol: 19 (1) Publisher: Institution of Engineering and Technology

DOI: 10.1049/cvi2.70035

Get Full-Text PDF Get Analytical Report

Abstract

ABSTRACT Accurate perception and understanding of the three‐dimensional environment is crucial for autonomous vehicles to navigate efficiently and make wise decisions. However, in complex real‐world scenarios, the information obtained by a single‐modal sensor is often incomplete, severely affecting the detection accuracy of occluded targets. To address this issue, this paper proposes a novel adaptive multi‐scale attention aggregation strategy, efficiently fusing multi‐scale feature representations of heterogeneous data to accurately capture the shape details and spatial relationships of targets in three‐dimensional space. This strategy utilises learnable sparse keypoints to dynamically align heterogeneous features in a data‐driven manner, adaptively modelling the cross‐modal mapping relationships between keypoints and their corresponding multi‐scale image features. Given the importance of accurately obtaining the three‐dimensional shape information of targets for understanding the size and rotation pose of occluded targets, this paper adopts a shape prior knowledge‐based constraint method and data augmentation strategy to guide the model to more accurately perceive the complete three‐dimensional shape and rotation pose of occluded targets. Experimental results show that our proposed model achieves 2.15%, 3.24% and 2.75% improvement in 3D R40 mAP score under the easy, moderate and hard difficulty levels compared to MVXNet, significantly enhancing the detection accuracy and robustness of occluded targets in complex scenarios.

Keywords:

Computer science Modal Artificial intelligence Feature (linguistics) Object detection Pattern recognition (psychology) Computer vision Object (grammar) Feature extraction Chemistry

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.14

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Robotics and Sensor-Based Localization

Physical Sciences → Engineering → Aerospace Engineering

Advanced Neural Network Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Adaptive Multiscale Attention Feature Aggregation for Multi‐Modal 3D Occluded Object Detection

Abstract

Metrics

Topics

Related Documents

Deformable Feature Aggregation for Dynamic Multi-modal 3D Object Detection

Adaptive multiscale feature for object detection

AutoAlign: Pixel-Instance Feature Aggregation for Multi-Modal 3D Object Detection

3D Multi-Modal Object Detection Based on Cross-Attention Feature Fusion

Attention guided multi-level feature aggregation network for camouflaged object detection