Single Image Super-Resolution (SISR) has been a foundational task in low-level vision for a long time. Recently, architectures based on Transformer have demonstrated outstanding performance in SISR tasks. However, the results of the attribution analysis indicate that transformer-based networks tend to underutilize surrounding pixels compared to other algorithms. We propose a novel architecture, the dropout multi-head attention transformer (DMAT), to use more input pixels for super-resolution. The DMAT enhances attention mechanisms by selectively obscuring key segments of windowed multi-head self-attention during the training . The approach ensures a more uniform attention distribution to pixels for super-resolution. Furthermore, to optimize multi-head attention learning and to integrate diverse attentions, we propose the head attention module (HAM) in DMAT to learn weights for individual attention head. The experimental results validate that our model outperforms prevailing state-of-the-art approaches across diverse test sets, especially in terms of contour structure and texture details.
Liangliang ZhaoJunyu GaoDeng Dong-huXuelong Li
Guanxing LiZhaotong CuiMeng LiYu HanTianping Li
Huapeng WuZhengxia ZouJie GuiWenjun ZengJieping YeJun ZhangHongyi LiuZhihui Wei
Yan WangYusen LiGang WangXiaoguang Liu