Text Gestalt: Stroke-Aware Scene Text Image Super-resolution

Jingye Chen; Haiyang Yu; Jianqi Ma; Bin Li; Xiangyang Xue

doi:10.1609/aaai.v36i1.19904

ScienceGate Book Chapters

JOURNAL ARTICLE

Text Gestalt: Stroke-Aware Scene Text Image Super-resolution

Jingye Chen Haiyang Yu Jianqi Ma Bin Li Xiangyang Xue

Year: 2022 Journal: Proceedings of the AAAI Conference on Artificial Intelligence Vol: 36 (1)Pages: 285-293 Publisher: Association for the Advancement of Artificial Intelligence

DOI: 10.1609/aaai.v36i1.19904

Get Full-Text PDF Get Analytical Report

Abstract

In the last decade, the blossom of deep learning has witnessed the rapid development of scene text recognition. However, the recognition of low-resolution scene text images remains a challenge. Even though some super-resolution methods have been proposed to tackle this problem, they usually treat text images as general images while ignoring the fact that the visual quality of strokes (the atomic unit of text) plays an essential role for text recognition. According to Gestalt Psychology, humans are capable of composing parts of details into the most similar objects guided by prior knowledge. Likewise, when humans observe a low-resolution text image, they will inherently use partial stroke-level details to recover the appearance of holistic characters. Inspired by Gestalt Psychology, we put forward a Stroke-Aware Scene Text Image Super-Resolution method containing a Stroke-Focused Module (SFM) to concentrate on stroke-level internal structures of characters in text images. Specifically, we attempt to design rules for decomposing English characters and digits at stroke-level, then pre-train a text recognizer to provide stroke-level attention maps as positional clues with the purpose of controlling the consistency between the generated super-resolution image and high-resolution ground truth. The extensive experimental results validate that the proposed method can indeed generate more distinguishable images on TextZoom and manually constructed Chinese character dataset Degraded-IC13. Furthermore, since the proposed SFM is only used to provide stroke-level guidance when training, it will not bring any time overhead during the test phase. Code is available at https://github.com/FudanVI/FudanOCR/tree/main/text-gestalt.

Keywords:

Computer science Gestalt psychology Artificial intelligence Code (set theory) Consistency (knowledge bases) Image (mathematics) Resolution (logic) Natural language processing Computer vision Pattern recognition (psychology) Information retrieval Psychology Perception Set (abstract data type)

Metrics

Cited By

4.35

FWCI (Field Weighted Citation Impact)

Refs

0.95

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Image Processing Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Image Processing Techniques and Applications

Physical Sciences → Engineering → Media Technology

Image and Video Stabilization

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Text Gestalt: Stroke-Aware Scene Text Image Super-resolution

Abstract

Metrics

Citation History

Topics

Related Documents

Text Feature-Aware Network for Scene Text Image Super-Resolution

Scene text image super-resolution with semantic-aware interaction

Scene Text Telescope: Text-Focused Scene Image Super-Resolution

Text Prior Guided Scene Text Image Super-Resolution

Skeleton-aware Text Image Super-Resolution