JOURNAL ARTICLE

RGB-Infrared Multi-Modal Remote Sensing Object Detection Using CNN and Transformer Based Feature Fusion

Abstract

Object detection in remote sensing images (RSIs) plays an important role both in civil and military fields. Currently, many object detection algorithms in RSIs have shown the excellent capability. However, these methods are designed for the single RGB modality, which cannot cope with the challenges in insufficient illumination or foggy scenarios. Infrared images measure the temperature of the captured objects, and it can avoid the influence of low illumination and fog. In this paper, we propose a novel RGB-Infrared multi-modal remote sensing object detection method termed as RIFuse to address these challenges. RIFuse combines convolutional neural networks (CNNs) and Transformer in a parallel hierarchy, which can efficiently extract the local features of RGB images and the global representations of infrared images. Besides, an adaptive multi-modal feature fusion block (MFF block) is proposed to fuse the features from both branches comprehensively. Extensive experiments demonstrate the superiority of our method for multi-modal object detection on RSIs.

Keywords:
Computer science Artificial intelligence RGB color model Computer vision Remote sensing Object detection Convolutional neural network Block (permutation group theory) Modal Feature (linguistics) Object (grammar) Pattern recognition (psychology) Geography

Metrics

6
Cited By
3.12
FWCI (Field Weighted Citation Impact)
14
Refs
0.91
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Infrared Target Detection Methodologies
Physical Sciences →  Engineering →  Aerospace Engineering
Remote-Sensing Image Classification
Physical Sciences →  Engineering →  Media Technology
Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.