JOURNAL ARTICLE

Unified Information Fusion Network for Multi-Modal RGB-D and RGB-T Salient Object Detection

Wei GaoGuibiao LiaoSiwei MaGe LiYongsheng LiangWeisi Lin

Year: 2021 Journal:   IEEE Transactions on Circuits and Systems for Video Technology Vol: 32 (4)Pages: 2091-2106   Publisher: Institute of Electrical and Electronics Engineers

Abstract

The use of complementary information, namely depth or thermal information, has shown its benefits to salient object detection (SOD) during recent years. However, the RGB-D or RGB-T SOD problems are currently only solved independently, and most of them directly extract and fuse raw features from backbones. Such methods can be easily restricted by low-quality modality data and redundant cross-modal features. In this work, a unified end-to-end framework is designed to simultaneously analyze RGB-D and RGB-T SOD tasks. Specifically, to effectively tackle multi-modal features, we propose a novel multi-stage and multi-scale fusion network (MMNet), which consists of a crossmodal multi-stage fusion module (CMFM) and a bi-directional multi-scale decoder (BMD). Similar to the visual color stage doctrine in the human visual system (HVS), the proposed CMFM aims to explore important feature representations in feature response stage, and integrate them into cross-modal features in adversarial combination stage. Moreover, the proposed BMD learns the combination of multi-level cross-modal fused features to capture both local and global information of salient objects, and can further boost the multi-modal SOD performance. The proposed unified cross-modality feature analysis framework based on two-stage and multi-scale information fusion can be used for diverse multi-modal SOD tasks. Comprehensive experiments (∼92K image-pairs) demonstrate that the proposed method consistently outperforms the other 21 state-of-the-art methods on nine benchmark datasets. This validates that our proposed method can work well on diverse multi-modal SOD tasks with good generalization and robustness, and provides a good multimodal SOD benchmark.

Keywords:
RGB color model Computer science Artificial intelligence Modal Feature (linguistics) Benchmark (surveying) Pattern recognition (psychology) Fuse (electrical) Modality (human–computer interaction) Computer vision Engineering

Metrics

190
Cited By
14.11
FWCI (Field Weighted Citation Impact)
92
Refs
0.99
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Visual Attention and Saliency Detection
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Olfactory and Sensory Function Studies
Life Sciences →  Neuroscience →  Sensory Systems
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

BMFNet: Bifurcated multi-modal fusion network for RGB-D salient object detection

Chenwang SunQing ZhangChenyu ZhuangMingqian Zhang

Journal:   Image and Vision Computing Year: 2024 Vol: 147 Pages: 105048-105048
JOURNAL ARTICLE

Modal complementary fusion network for RGB-T salient object detection

Shuai MaKechen SongHongwen DongHongkun TianYunhui Yan

Journal:   Applied Intelligence Year: 2022 Vol: 53 (8)Pages: 9038-9055
JOURNAL ARTICLE

Multi-modal cooperative fusion network for dual-stream RGB-D salient object detection

Jingyu WuFuming SunHaojie LiMingyu Lu

Journal:   Image and Vision Computing Year: 2025 Vol: 166 Pages: 105835-105835
JOURNAL ARTICLE

Multi-modality information refinement fusion network for RGB-D salient object detection

Hua BaoBo Fan

Journal:   The Visual Computer Year: 2023 Vol: 40 (6)Pages: 4183-4199
© 2026 ScienceGate Book Chapters — All rights reserved.