JOURNAL ARTICLE

Subjective and Objective Audio-Visual Quality Assessment for Omnidirectional Videos

Xilei ZhuHuiyu DuanYuqin CaoYucheng ZhuYuxin ZhuJing LiuXiongkuo MinGuangtao ZhaiPatrick Le Callet

Year: 2025 Journal:   IEEE Transactions on Image Processing Vol: 34 Pages: 6506-6523   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Virtual Reality (VR) has attracted widespread attention in recent years due to its capability to create immersive experiences by presenting multi-modal information to users. Omnidirectional videos (ODVs), as a prominent component of VR content, are essential across diverse applications. This necessitates service providers to monitor and optimize the quality of ODVs throughout the filming, encoding, decoding, and transmission stages to ensure a high-quality viewing experience. However, most existing Quality of Experience (QoE) studies for ODVs only focus on the visual quality, while overlooking the impact of the audio modality on perceptual quality. This paper presents a comprehensive study of omnidirectional audio-visual quality assessment (OD-AVQA) from both subjective and objective perspectives. Specifically, we first establish a large-scale audio-visual quality assessment database for ODVs named OAVQAD+, which includes 625 distorted omnidirectional audio-visual sequences derived from 25 pristine ODVs, and the corresponding collected mean opinion scores (MOSs) for the QoE of these ODVs. This contributes to the largest database for assessing the audio-visual quality of ODVs. To advance the fields of objective OD-AVQA, we construct a benchmark that includes three types of benchmark models. Type I and Type II models integrate well-known video quality assessment (VQA) and audio quality assessment (AQA) methods using support vector regression (SVR) and multi-layer perceptron (MLP), respectively, while Type III consists of AVQA models specifically designed for traditional 2D audio-visual sequences. We also propose a novel Omnidirectional Audio-Visual quality assessment Network (OmniAVNet) that integrates quality-aware audio, visual, and motion features to predict overall audio-visual quality for ODVs effectively, which supports both full-reference (FR) and no-reference (NR) assessment. Extensive experimental results demonstrate that OmniAVNet outperforms the aforementioned benchmark OD-AVQA models on two OD-AVQA databases, and shows great performance on one omnidirectional VQA database. The database and code are available at https://github.com/IntMeGroup/OmniAVNet.

Keywords:

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.42
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Image and Video Quality Assessment
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.