Da AiMingyue LuJiahao WangYing Liu
With the popularity and development of short video applications, the behavior of using mobile devices to shoot and share user-generated content (UGC) videos has become increasingly common. Video quality assessment (VQA) is critical in guaranteeing end-user viewing experiences. UGC-VQA is a challenging problem due to the complexity and variety of distortion types of UGC videos and the absence of reference videos. To improve the consistency of UGC-VQA results and human subjective ratings, in this paper, we propose a UGC-VQA method based on spatiotemporal visual perception (STVP). Firstly, a hierarchical feature fusion module was added to the feature extraction network to realize the fusion of low-level visual features and high-level semantic features, and obtain the quality perception features with rich visual information. Then, we use the self-attention to weight different frames to distinguish their importance. The long short-term memory (LSTM) network and the time pool are used to model long-term dependencies and temporal memory effects. Experimental results on UGC-VQA datasets show that the proposed method achieves a performance improvement of nearly 2%, and its evaluation results are more consistent with human visual perception.
Yaya TanGuangqian KongXun DuanYun WuHuiyun Long
Hanwei ZhuBaoliang ChenLingyu ZhuShiqi Wang
Jari KorhonenXuanzheng WenJun ChengXu Wang
Parimala KancharlaSumohana S. Channappayya