Better Skeleton Better Readability: Scene Text Image Super-Resolution via Skeleton-Aware Diffusion Model

Shrey Singh; Prateek Keserwani; Partha Pratim Roy; Rajkumar Saini

doi:10.1109/access.2024.3510136

ScienceGate Book Chapters

JOURNAL ARTICLE

Better Skeleton Better Readability: Scene Text Image Super-Resolution via Skeleton-Aware Diffusion Model

Shrey Singh Prateek Keserwani Partha Pratim Roy Rajkumar Saini

Year: 2024 Journal: IEEE Access Vol: 12 Pages: 187640-187651 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/access.2024.3510136

Get Full-Text PDF Get Analytical Report

Abstract

Scene text image super-resolution (STISR) aims to enhance the resolution of text images while simultaneously improving their readability by reducing noise, blur, and other degradations. Existing diffusion-based approaches for STISR primarily rely on text-prior information but often overlook the importance of explicitly modeling the visual structure of the text. In this paper, we propose a novel Skeleton-Aware Diffusion Method (SADM) for STISR, which introduces text skeletons as structural guidance to the diffusion process. The text skeleton serves as a critical visual cue, helping the model to better restore the fine details of text, even in severely degraded low-resolution images. Generating high-quality skeletons from low-resolution scene text is a challenging task due to the inherent blurring and noise present in such images. To tackle this, we introduce a diffusion-based Skeleton Correction Network (SCN), which refines the initial skeletons produced by a convolutional neural network-based skeletonization model. The SCN effectively improves the accuracy of the skeletons, allowing for more precise structural guidance during the diffusion process. Our extensive experiments demonstrate the significant benefits of incorporating skeleton information into the STISR pipeline. The proposed SADM achieves state-of-the-art performance on the TextZoom dataset, with accuracies of 81.4%, 64.9%, and 49.6% on the easy, medium, and hard subsets, respectively, compared to the previous best results by ASTER text recognizer. Through detailed analysis, we also show that improving the quality of skeletons from low-resolution images leads to better super-resolution outcomes and enhances the performance of text recognizers.

Keywords:

Skeleton (computer programming) Readability Computer science Computer vision Artificial intelligence Image (mathematics) Resolution (logic) Image resolution Computer graphics (images) Pattern recognition (psychology)

Metrics

Cited By

1.23

FWCI (Field Weighted Citation Impact)

Refs

0.77

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Image Processing Techniques and Applications

Physical Sciences → Engineering → Media Technology

Better Skeleton Better Readability: Scene Text Image Super-Resolution via Skeleton-Aware Diffusion Model

Abstract

Metrics

Citation History

Topics

Related Documents

Skeleton-aware Text Image Super-Resolution

Text Gestalt: Stroke-Aware Scene Text Image Super-resolution

DCDM: Diffusion-Conditioned-Diffusion Model for Scene Text Image Super-Resolution

HiREN: Towards higher supervision quality for better scene text image super-resolution

T‐Skeleton: Accurate scene text detection via instance‐aware skeleton embedding