Multi-Scale Transformer Network for Saliency Prediction on 360-Degree Images

Xu Lin; Chunmei Qing; Junpeng Tan; Xiangmin Xu

doi:10.1109/icip49359.2023.10222683

ScienceGate Book Chapters

JOURNAL ARTICLE

Multi-Scale Transformer Network for Saliency Prediction on 360-Degree Images

Xu Lin Chunmei Qing Junpeng Tan Xiangmin Xu

Year: 2023 Pages: 1700-1704

DOI: 10.1109/icip49359.2023.10222683

Get Full-Text PDF Get Analytical Report

Abstract

The latest methods for saliency prediction on 360° images show that better results can be obtained using equirectangular (ERP) images as input. Due to the limitation of the receptive field, existing convolution-based networks cannot capture long-range information in complex 360° images. Although the transformer has the innate ability to capture long-range correlations with self-attention, large dataset requirement limit its application in saliency prediction of 360° images. In this paper, we present a novel Multi-scale Transformer framework for Saliency prediction on 360° images (MTSal360). The Multi-scale Transformer Module (MTM) is designed in the network to aggregate the contextual long-range information, which includes a Convolutional Positional Encoder (CPE) to enable the model could train and test on cubic and ERP format separately to address the insufficient data. Experiments on two public datasets illustrate that MTSal360 achieves better results over the state-of-the-art methods.

Keywords:

Computer science Transformer Encoder Artificial intelligence Pattern recognition (psychology) Convolutional neural network Computer vision Data mining Engineering

Metrics

Cited By

0.18

FWCI (Field Weighted Citation Impact)

Refs

0.42

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Visual Attention and Saliency Detection

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Image and Video Quality Assessment

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image Fusion Techniques

Physical Sciences → Engineering → Media Technology

Multi-Scale Transformer Network for Saliency Prediction on 360-Degree Images

Abstract

Metrics

Citation History

Topics

Related Documents

Scanpath and saliency prediction on 360 degree images

Visual Saliency Prediction on 360 Degree Images With CNN

Saliency Prediction for 360-degree Video

Transformer-Based Multi-Scale Feature Integration Network for Video Saliency Prediction

SaltiNet: Scan-path Prediction on 360 Degree Images using Saliency\n Volumes