JOURNAL ARTICLE

A Spherical Convolution Approach for Learning Long Term Viewport Prediction in 360 Immersive Video

Chenglei WuRui-Xiao ZhangZhi WangLifeng Sun

Year: 2020 Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Vol: 34 (01)Pages: 14003-14040   Publisher: Association for the Advancement of Artificial Intelligence

Abstract

Viewport prediction for 360 video forecasts a viewer’s viewport when he/she watches a 360 video with a head-mounted display, which benefits many VR/AR applications such as 360 video streaming and mobile cloud VR. Existing studies based on planar convolutional neural network (CNN) suffer from the image distortion and split caused by the sphere-to-plane projection. In this paper, we start by proposing a spherical convolution based feature extraction network to distill spatial-temporal 360 information. We provide a solution for training such a network without a dedicated 360 image or video classification dataset. We differ with previous methods, which base their predictions on image pixel-level information, and propose a semantic content and preference based viewport prediction scheme. In this paper, we adopt a recurrent neural network (RNN) network to extract a user's personal preference of 360 video content from minutes of embedded viewing histories. We utilize this semantic preference as spatial attention to help network find the "interested'' regions on a future video. We further design a tailored mixture density network (MDN) based viewport prediction scheme, including viewport modeling, tailored loss function, etc, to improve efficiency and accuracy. Our extensive experiments demonstrate the rationality and performance of our method, which outperforms state-of-the-art methods, especially in long-term prediction.

Keywords:
Viewport Computer science Convolutional neural network Artificial intelligence Projection (relational algebra) Convolution (computer science) Feature (linguistics) Computer vision Artificial neural network Algorithm

Metrics

42
Cited By
2.04
FWCI (Field Weighted Citation Impact)
34
Refs
0.90
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Image and Video Quality Assessment
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Visual Attention and Saliency Detection
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Vision and Imaging
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.