HierNet: Hierarchical Transformer U-Shape Network for RGB-D Salient Object Detection

Pengfei Lv; Xiaosheng Yu; Junxiang Wang; Chengdong Wu

doi:10.1109/ccdc58219.2023.10327419

ScienceGate Book Chapters

JOURNAL ARTICLE

HierNet: Hierarchical Transformer U-Shape Network for RGB-D Salient Object Detection

Pengfei Lv Xiaosheng Yu Junxiang Wang Chengdong Wu

Year: 2023 Pages: 1807-1811

DOI: 10.1109/ccdc58219.2023.10327419

Get Full-Text PDF Get Analytical Report

Abstract

With the popularity of depth sensors, research on RGB-D salient object detection (SOD) is also thriving. However, given the limitations of the external environment and the sensor itself, depth information is often less credible. To meet this challenge, existing models often purify the depth information using complex convolution and pooling operations. This causes a large amount of useful information besides noise to be dropped as well, and multi-modality interaction chances between RGB and depth become less. Also, with the gradual loss of information, the hidden relationship of features between multi-level is thus ignored. To tackle the aforementioned problems, we propose a Hierarchical Transformer U-Shape Network (HierNet) that include three aspects: 1) With a simple structure, a depth calibration module provides faithful depth information with minimal loss of information, providing conditions for cross-modality cross-layer information interaction; 2) With multi-head attention, a set of global view-based transformer encoders are employed to find the potential coherence between RGB and depth modalities. With weight sharing, several transformer encoder sets comprise the hierarchical transformer embedding module to search long-range dependencies cross-level; 3) Considering the complementary features of U-shape network, we use dual-stream U-shape network as our backbone. Extensive fair experiments on four challenging datasets have demonstrated the outstanding performance of the proposed model compared to state-of-the-art models.

Keywords:

Computer science Encoder RGB color model Artificial intelligence Transformer Embedding Salient Pooling Computer vision Pattern recognition (psychology) Data mining Engineering

Metrics

Cited By

0.18

FWCI (Field Weighted Citation Impact)

Refs

0.45

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Visual Attention and Saliency Detection

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Gaze Tracking and Assistive Technology

Physical Sciences → Computer Science → Human-Computer Interaction

Image and Video Quality Assessment

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

HierNet: Hierarchical Transformer U-Shape Network for RGB-D Salient Object Detection

Abstract

Metrics

Citation History

Topics

Related Documents

HEFT: Hierarchical Enhanced Fusion Transformer for RGB-D Salient Object Detection

GroupTransNet: Group transformer network for RGB-D salient object detection

Hierarchical U-Shape Attention Network for Salient Object Detection

Hierarchical Alternate Interaction Network for RGB-D Salient Object Detection

Hierarchical Dynamic Filtering Network for RGB-D Salient Object Detection