Object detection in remote sensing images (RSIs) plays an important role both in civil and military fields. Currently, many object detection algorithms in RSIs have shown the excellent capability. However, these methods are designed for the single RGB modality, which cannot cope with the challenges in insufficient illumination or foggy scenarios. Infrared images measure the temperature of the captured objects, and it can avoid the influence of low illumination and fog. In this paper, we propose a novel RGB-Infrared multi-modal remote sensing object detection method termed as RIFuse to address these challenges. RIFuse combines convolutional neural networks (CNNs) and Transformer in a parallel hierarchy, which can efficiently extract the local features of RGB images and the global representations of infrared images. Besides, an adaptive multi-modal feature fusion block (MFF block) is proposed to fuse the features from both branches comprehensively. Extensive experiments demonstrate the superiority of our method for multi-modal object detection on RSIs.
Yaohui ZhuXiaoyu SunMiao WangHua Huang
Congying SunJing ZhangHuinan GuoWuxia Zhang
Long GaoKe YangWanlin ZhaoYang ZhangJiang YanGang HeYunsong Li
Zhenyu ZhangHuiyan ChenQingzhen XuQiang Chen
Jinyan NieHe SunXu SunLi NiLianru Gao