Video super-resolution is the task of converting low-resolution video to high-resolution video. Existing methods with better intuitive effects are mainly based on convolutional neural networks (CNNs), but the architecture is heavy, resulting in a slow inference structure. Aiming at this problem, this paper proposes a real-time video super-resolution. Real-time video super resolution transformer (RVSRT) can quickly complete the super-resolution task while considering the visual fluency of video frame switching. Unlike traditional methods based on CNNs, this paper does not process video frames separately with different network modules in the temporal domain, but batches adjacent frames through a single UNet-style structure end-to-end Transformer network architecture. Moreover, this paper creatively sets up two-stage interpolation sampling before and after the end-to-end network to maximize the performance of the traditional CV algorithm. The experimental results show that compared with SOTA TMNet, RVSRT has only 50% of the network size (6.1M vs 12.3M, parameters) while ensuring comparable performance, and the speed is increased by 80% (26.2 fps vs 14.3 fps, frame size is 720*576).
Zhicheng GengLuming LiangTianyu DingIlya Zharkov
Barak FishbainLeonid YaroslavskyIanir Ideses
Guanqun LiuXin WangDaren ZhaLei WangLin Zhao