JOURNAL ARTICLE

Cross-Modal Feature Fusion and Interaction Strategy for CNN-Transformer-Based Object Detection in Visual and Infrared Remote Sensing Imagery

Jinyan NieHe SunXu SunLi NiLianru Gao

Year: 2023 Journal:   IEEE Geoscience and Remote Sensing Letters Vol: 21 Pages: 1-5   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Due to the complementarity of visible and infrared images, it has become more favorable to fuse these two modalities to improve the object detection accuracy in the remote sensing area. However, there are still some problems to be solved. Most of the existing algorithms focus too much on the local information and ignore long-range information when performing feature extraction on different modalities. Besides, coarse weighted fusion strategies do not fully utilize the information from different modalities, and the fusion structure ignores the importance of intermodal information exchange. To tackle these problems, a cross-modal feature fusion and interaction strategy for the convolutional neural network (CNN)-transformer-based object detection in visual and infrared remote sensing imagery is proposed. We adopt a parallel structure to extract the features of different modalities, separately. In visual and infrared modality, the convolutional layers and transformer encoders are cascaded to fully extract both local and long-range information. The cross-modal feature fusion and interaction module (CFFIM) adopts the attention mechanisms to jointly fuse different modal features at the same scale to improve the diversity of fused features, and the feature interaction enables the sharing of visible and infrared information. Experiments on the VEDAI dataset have demonstrated the effectiveness of the proposed scheme compared to other state-of-the-art algorithms.

Keywords:
Computer science Artificial intelligence Fuse (electrical) Modal Computer vision Object detection Feature extraction Encoder Visualization Pattern recognition (psychology) Feature (linguistics) Engineering

Metrics

26
Cited By
5.64
FWCI (Field Weighted Citation Impact)
16
Refs
0.95
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Remote-Sensing Image Classification
Physical Sciences →  Engineering →  Media Technology
Infrared Target Detection Methodologies
Physical Sciences →  Engineering →  Aerospace Engineering
Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.