JOURNAL ARTICLE

SVGC-AVA: 360-Degree Video Saliency Prediction With Spherical Vector-Based Graph Convolution and Audio-Visual Attention

Qin YangYuqi LiChenglin LiHao WangSa YanWei LiWenrui DaiJunni ZouHongkai XiongPascal Frossard

Year: 2023 Journal:   IEEE Transactions on Multimedia Vol: 26 Pages: 3061-3076   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Viewers of 360-degree videos are provided with both visual modality to characterize their surrounding views and audio modality to indicate the sound direction. Though both modalities are important for saliency prediction, little work has been done by jointly exploiting them, which is mainly due to the lack of audio-visual saliency datasets and insufficient exploitation of the multi-modality. In this article, we first construct an audio-visual saliency dataset with 57 360-degree videos watched by 63 viewers. Through a deep analysis of the constructed dataset, we find that the human gaze can be attracted by the auditory cues, resulting in a more concentrated saliency map if the sound source's location is further provided. To jointly exploit the visual and audio features and their correlation, we further design a saliency prediction network for 360-degree videos (SVGC-AVA) based on spherical vector-based graph convolution and audio-visual attention. The proposed spherical vector-based graph convolution can process visual and audio features directly in the sphere domain, thus avoiding projection distortion incurred by traditional CNN-based predictors. In addition, the audio-visual attention scheme explores self-modal and cross-modal correlation for both modalities, which are further hierarchically processed with the U-Net's multi-scale structure of SVGC-AVA. Evaluations on both our and public datasets validate that SVGC-AVA can achieve higher prediction accuracy, both qualitatively and subjectively.

Keywords:
Computer science Convolution (computer science) Artificial intelligence Graph Degree (music) Computer vision Speech recognition Pattern recognition (psychology) Theoretical computer science Artificial neural network Acoustics

Metrics

8
Cited By
1.46
FWCI (Field Weighted Citation Impact)
56
Refs
0.79
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Visual Attention and Saliency Detection
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Olfactory and Sensory Function Studies
Life Sciences →  Neuroscience →  Sensory Systems
Image and Video Quality Assessment
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.