Mutual-Guidance Transformer-Embedding Network for Video Salient Object Detection

Dingyao Min; Chao Zhang; Yukang Lu; Keren Fu; Qijun Zhao

doi:10.1109/lsp.2022.3192753

ScienceGate Book Chapters

JOURNAL ARTICLE

Mutual-Guidance Transformer-Embedding Network for Video Salient Object Detection

Dingyao Min Chao Zhang Yukang Lu Keren Fu Qijun Zhao

Year: 2022 Journal: IEEE Signal Processing Letters Vol: 29 Pages: 1674-1678 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/lsp.2022.3192753

Get Full-Text PDF Get Analytical Report

Abstract

Video salient object detection (VSOD) aims at locating the most attractive objects presented in video sequences by exploiting spatial and temporal cues. Previous methods mainly utilize convolutional neural networks (CNNs) to fuse or complement across RGB and optical flow cues via simple strategies. To take full advantage of CNNs and recently emerged Transformers, this letter proposes a novel mutual-guidance Transformer-embedding network, called MGT-Net, where a mutual-guidance multi-head attention mechanism (MGMA) explores more sophisticated long-range cross-modal interactions. Such a mechanism is designed into a new mutual-guidance Transformer (MGTrans) module that can propagate long-range contextual dependencies based on information of the other modality. To the best of our knowledge, MGT-Net is the first VSOD model that embeds Transformers as modules into CNNs for improved performance. Prior to MGTrans, we also propose and deploy a feature purification module (FPM) to purify noisy backbone features. Experimental results on five benchmark datasets demonstrate the state-of-the-art performance of MGT-Net.

Keywords:

Computer science Transformer Artificial intelligence Embedding Convolutional neural network Optical flow Mutual information Pattern recognition (psychology) Computer vision Object detection Engineering Voltage

Metrics

Cited By

1.86

FWCI (Field Weighted Citation Impact)

Refs

0.84

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Visual Attention and Saliency Detection

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Neural Network Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Mutual-Guidance Transformer-Embedding Network for Video Salient Object Detection

Abstract

Metrics

Citation History

Topics

Related Documents

Guidance And Teaching Network For Video Salient Object Detection

Transformer-based Cross Reference Network for video salient object detection

Bidirectional mutual guidance transformer for salient object detection in optical remote sensing images

TENet: Accurate light-field salient object detection with a transformer embedding network

STEG-Net: Spatiotemporal Edge Guidance Network for Video Salient Object Detection