Transformer-Based Cross-Modal Integration Network for RGB-T Salient Object Detection

Chengtao Lv; Xiaofei Zhou; Bin Wan; Shuai Wang; Yaoqi Sun; Jiyong Zhang; Chenggang Yan

doi:10.1109/tce.2024.3390841

ScienceGate Book Chapters

JOURNAL ARTICLE

Transformer-Based Cross-Modal Integration Network for RGB-T Salient Object Detection

Chengtao Lv Xiaofei Zhou Bin Wan Shuai Wang Yaoqi Sun Jiyong Zhang Chenggang Yan

Year: 2024 Journal: IEEE Transactions on Consumer Electronics Vol: 70 (2)Pages: 4741-4755 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/tce.2024.3390841

Get Full-Text PDF Get Analytical Report

Abstract

Salient object detection (SOD) can be applied to consumer electronic area, which can help to identify and locate objects of interest. RGB/RGB-D (depth) salient object detection has achieved great progress in recent years. However, there is a large room for improvement in exploring the complementarity of two-modal information for RGB-T (thermal) SOD. Therefore, this paper proposes a Transformer-based Cross-modal Integration Network (i.e., TCINet) to detect salient objects in RGB-T images, which can properly fuse two-modal features and interactively aggregate two-level features. Our method consists of the siamese Swin Transformer-based encoders, the cross-modal feature fusion (CFF) module, and the interaction-based feature decoding (IFD) block. Here, the CFF module is designed to fuse the complementary information of two-modal features, where the collaborative spatial attention emphasizes salient regions and suppresses background regions of the two-modal features. Furthermore, we deploy the IFD block to aggregate two-level features, including the previous-level fused feature and the current-level encoder feature, where the IFD block bridges the large semantic gap and reduces the noise. Extensive experiments are conducted on three RGB-T datasets, and the experimental results clearly demonstrate the superiority and effectiveness of our method when compared with the cutting-edge saliency methods. The results and code of our method will be available at https://github.com/lvchengtao/TCINet.

Keywords:

Computer science Transformer Modal Electronic engineering Artificial intelligence Computer vision Engineering Voltage Electrical engineering Materials science

Metrics

Cited By

11.66

FWCI (Field Weighted Citation Impact)

Refs

0.98

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Visual Attention and Saliency Detection

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image Fusion Techniques

Physical Sciences → Engineering → Media Technology

Infrared Target Detection Methodologies

Physical Sciences → Engineering → Aerospace Engineering

Transformer-Based Cross-Modal Integration Network for RGB-T Salient Object Detection

Abstract

Metrics

Citation History

Topics

Related Documents

Transformer-Based Cross-Modal Feature Fusion Network for RGB-D Salient Object Detection

Asymmetric cross-modal activation network for RGB-T salient object detection

Transformer-based cross-modality interaction guidance network for RGB-T salient object detection

Lightweight cross-modal transformer for RGB-D salient object detection

Modal complementary fusion network for RGB-T salient object detection