JOURNAL ARTICLE

Multi-Modal Feature Pyramid Transformer for RGB-Infrared Object Detection

Yaohui ZhuXiaoyu SunMiao WangHua Huang

Year: 2023 Journal:   IEEE Transactions on Intelligent Transportation Systems Vol: 24 (9)Pages: 9984-9995   Publisher: Institute of Electrical and Electronics Engineers

Abstract

RGB-Infrared multi-modal object detection utilizes diverse and complementary information, showing some advantages in intelligent transportation field. The main challenge of RGB-Infrared object detection is how to fuse the two modalities. The difficulty of fusion is reflected in two aspects: 1) large visual differences between modalities make it difficult to learn effective complementary features, 2) some misaligned RGB-Infrared images increase the difficulty of fusion. To this end, based on feature pyramid commonly used in object detection, we propose Multi-modal Feature Pyramid Transformer (MFPT) to fuse the two modalities. The proposed MFPT learns semantic and modal complementary information to enhance each modal features via intra-modal feature pyramid transformer and inter-modal feature pyramid transformer. The intra-modal feature pyramid transformer enables features to interact across space and scales, improving the semantic representations of features in each modality. The inter-modal feature pyramid transformer conducts feature interaction between modalities, enabling each modality to learn complementary features from other modalities. Meanwhile, the inter-modal feature pyramid transformer can also learn distance independent dependencies between modalities, which are not sensitive to misaligned images. Furthermore, a local attention mechanism is introduced within different windows into MFPT to achieve efficient correlation between regions of different scales or different modalities. Experimental results on two RGB-Infrared detection datasets demonstrate the proposed method is superior to state-of-the-art methods.

Keywords:
Artificial intelligence Computer science Computer vision RGB color model Pyramid (geometry) Transformer Feature (linguistics) Pattern recognition (psychology) Modal Object detection Feature extraction Modalities Engineering Mathematics

Metrics

62
Cited By
11.28
FWCI (Field Weighted Citation Impact)
72
Refs
0.98
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Video Surveillance and Tracking Methods
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Visual Attention and Saliency Detection
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Multi-Modal Transformer for RGB-D Salient Object Detection

Peipei SongJing ZhangPiotr KoniuszNick Barnes

Journal:   2022 IEEE International Conference on Image Processing (ICIP) Year: 2022 Pages: 2466-2470
JOURNAL ARTICLE

Multi-modal deep feature learning for RGB-D object detection

Xiangyang XuYuncheng LiGangshan WuJiebo Luo

Journal:   Pattern Recognition Year: 2017 Vol: 72 Pages: 300-313
JOURNAL ARTICLE

Specificity-Guided Cross-Modal Feature Reconstruction for RGB-Infrared Object Detection

Xiaoyu SunYaohui ZhuHua Huang

Journal:   IEEE Transactions on Intelligent Transportation Systems Year: 2024 Vol: 26 (1)Pages: 950-961
© 2026 ScienceGate Book Chapters — All rights reserved.