JOURNAL ARTICLE

Transformer-Based Cross-Modal Integration Network for RGB-T Salient Object Detection

Chengtao LvXiaofei ZhouBin WanShuai WangYaoqi SunJiyong ZhangChenggang Yan

Year: 2024 Journal:   IEEE Transactions on Consumer Electronics Vol: 70 (2)Pages: 4741-4755   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Salient object detection (SOD) can be applied to consumer electronic area, which can help to identify and locate objects of interest. RGB/RGB-D (depth) salient object detection has achieved great progress in recent years. However, there is a large room for improvement in exploring the complementarity of two-modal information for RGB-T (thermal) SOD. Therefore, this paper proposes a Transformer-based Cross-modal Integration Network (i.e., TCINet) to detect salient objects in RGB-T images, which can properly fuse two-modal features and interactively aggregate two-level features. Our method consists of the siamese Swin Transformer-based encoders, the cross-modal feature fusion (CFF) module, and the interaction-based feature decoding (IFD) block. Here, the CFF module is designed to fuse the complementary information of two-modal features, where the collaborative spatial attention emphasizes salient regions and suppresses background regions of the two-modal features. Furthermore, we deploy the IFD block to aggregate two-level features, including the previous-level fused feature and the current-level encoder feature, where the IFD block bridges the large semantic gap and reduces the noise. Extensive experiments are conducted on three RGB-T datasets, and the experimental results clearly demonstrate the superiority and effectiveness of our method when compared with the cutting-edge saliency methods. The results and code of our method will be available at https://github.com/lvchengtao/TCINet.

Keywords:
Computer science Transformer Modal Electronic engineering Artificial intelligence Computer vision Engineering Voltage Electrical engineering Materials science

Metrics

22
Cited By
11.66
FWCI (Field Weighted Citation Impact)
85
Refs
0.98
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Visual Attention and Saliency Detection
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image Fusion Techniques
Physical Sciences →  Engineering →  Media Technology
Infrared Target Detection Methodologies
Physical Sciences →  Engineering →  Aerospace Engineering
© 2026 ScienceGate Book Chapters — All rights reserved.