Implicit neural representation has recently shown a promising ability in representing images with arbitrary resolutions. In this paper, we present a Local Implicit Transformer (LIT), which integrates the attention mechanism and frequency encoding technique into a local implicit image function. We design a cross-scale local attention block to effectively aggregate local features and a local frequency encoding block to combine positional encoding with Fourier domain information for constructing high-resolution images. To further improve representative power, we propose a Cascaded LIT (CLIT) that exploits multi-scale features, along with a cumulative training strategy that gradually increases the upsampling scales during training. We have conducted extensive experiments to validate the effectiveness of these components and analyze various training strategies. The qualitative and quantitative results demonstrate that LIT and CLIT achieve favorable results and outperform the prior works in arbitrary super-resolution tasks.
Jinchen ZhuMingjian ZhangLing ZhengShizhuang Weng
Jie-En YaoLi-Yuan TsaoYi‐Chen LoRoy TsengChia‐Che ChangChun‐Yi Lee
Hongwei LiTao DaiYiming LiXueyi ZouShu‐Tao Xia
Yi Ting TsaiYu Wei ChenHong-Han ShuaiChing-Chun Huang
Dehong HeSong WuJinpeng LiuGuoqiang Xiao