Video super-resolution aims to reconstruct the high-resolution video from its degraded observation. Existing methods primarily utilize explicit alignment strategies, including optical flow and motion compensation, and implicit alignment strategies, including deformable convolution and non-local attention, to model the temporal information. However, these alignment strategies with motion estimation and motion compensation (MEMC) introduce inaccurate inter-frame information to some extent, which largely affects the reconstruction performance. To avoid the error compensation information, we propose to model the temporal information from the perspective of the self-similarity between frames and design a multi-frame correlated representation network (MCRNet) for video super-resolution. To address the issue of temporal information distortion caused by large-scale pixel displacement and object occlusion, MCRNet extracts temporal information through similar regions on multiple frames, which are aggregated with allocated weights for information compensation. Moreover, we design a multi-scale non-local information fusion module for non-local correlation matching of spatio-temporal features in the multi-scale space, which maintains the scale consistency of spatio-temporal features. Experimental results indicate that MCRNet achieves promising gains over other competing methods which employ explicit and implicit alignment strategies on different datasets.
Sen WangYang ZHUYinhui ZhangQingjian WANGZifen HE
Qiqin DaiSeunghwan YooArmin KappelerAggelos K. Katsaggelos
Bin YangJuhao JiangGuannan Chen
T. HoriRemina YanoHiroshi WatanabeTakeshi ChujohTomohiro IkaiEiichi SasakiTakuya SuzukiNorio Ito
Hongying LiuWanhao MaZhubo RuanChaowei FangFanhua ShangYuanyuan LiuLijun WangChaoli WangDongmei Jiang