Scene text image super-resolution has significantly improved the accuracy of scene text recognition. However, many existing methods emphasize performance over efficiency and ignore the practical need for lightweight solutions in deployment scenarios. Faced with the issues, our work proposes an efficient framework called SGENet to facilitate deployment on resource-limited platforms. SGENet contains two branches: super-resolution branch and semantic guidance branch. We apply a lightweight pre-trained recognizer as a semantic extractor to enhance the understanding of text information. Meanwhile, we design the visual-semantic alignment module to achieve bidirectional alignment between image features and semantics, resulting in the generation of high-quality prior guidance. We conduct extensive experiments on benchmark dataset, and the proposed SGENet achieves excellent performance with fewer computational costs.
Yogesh SurapaneniChakravarthy Bhagvati
Mo ZhouW. LiuJin WanDelong HanMin LiGang Li
Chengyue ShiWenbo ShiJintong HuWenming Yang
Renchao ZhuYuezhong ChuXuefeng ZhangXiaolong Liu
Cairong ZhaoRui ShuShuyang FengZhu LiangXuekuan Wang