Immersive multimedia content delivery is becoming increasingly popular due to the spread of Head Mounted Displays. In particular, omnidirectional video streaming is gaining ground among video delivery platforms. Delivering 360° video content over the Internet requires much larger bandwidth compared to classic 2D videos. Therefore, for the purpose of reducing bandwidth consumption, the tiling technique breaks down the video into smaller portions so that those falling outside the user's viewport are encoded at a low resolution whereas those in the viewport are encoded at a higher resolution. This operation can be performed only when the user's future viewports are known in advance. Thus, it is necessary to provide a trustworthy prediction of future viewports. In this work, we show that users have a tendency to explore the environment at the beginning of the video and then to focus on one of the regions attracting more attention (Points of Interest). This insight is helpful when it comes to designing viewport-adaptive streaming techniques. On this basis, we propose a viewport prediction approach that combines Long Short-Term Memory (LSTM) networks and the classic naive technique. Preliminary simulative tests show promising results.
Saba AhsanSujeet MateIgor D. D. CurcioAlireza AminlouYu YouEmre AksuMiska M. Hannuksela
Haiping HeFeng ZhaoZhaoYu SuHu ZhouYiming ZhangJiaWei Xiong
Chenglei WuRui-Xiao ZhangZhi WangLifeng Sun