Towards Robust Scene Text Recognition: A Dual Correction Mechanism with Deformable Alignment

Yanxu Feng; Changlu Li

doi:10.3390/electronics14193968

ScienceGate Book Chapters

JOURNAL ARTICLE

Towards Robust Scene Text Recognition: A Dual Correction Mechanism with Deformable Alignment

Yanxu Feng Changlu Li

Year: 2025 Journal: Electronics Vol: 14 (19)Pages: 3968-3968 Publisher: Multidisciplinary Digital Publishing Institute

DOI: 10.3390/electronics14193968

Get Full-Text PDF Get Analytical Report

Abstract

Scene Text Recognition (STR) faces significant challenges under complex degradation conditions, such as distortion, occlusion, and semantic ambiguity. Most existing methods rely heavily on language priors for correction, but effectively constructing language rules remains a complex problem. This paper addresses two key challenges: (1) The over-correction behavior of language models, particularly on semantically deficient input, can result in both recognition errors and loss of critical information. (2) Character misalignment in visual features, which affects recognition accuracy. To address these problems, we propose a Deformable-Alignment-based Dual Correction Mechanism (DADCM) for STR. Our method includes the following key components: (1) We propose a visually guided and language-assisted correction strategy. A dynamic confidence threshold is used to control the degree of language model intervention. (2) We designed a visual backbone network called SCRTNet. The net enhances key text regions through a channel attention module (SENet) and applies deformable convolution (DCNv4) in deep layers to better model distorted or curved text. (3) We propose a deformable alignment module (DAM). The module combines Gumbel-Softmax-based anchor sampling and geometry-aware self-attention to improve character alignment. Experiments on multiple benchmark datasets demonstrate the superiority of our approach. Especially on the Union14M-Benchmark, where the recognition accuracy surpasses previous methods by 1.1%, 1.6%, 3.0%, and 1.3% on the Curved, Multi-Oriented, Contextless, and General subsets, respectively.

Keywords:

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.42

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Handwritten Text Recognition Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Image Processing and 3D Reconstruction

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Towards Robust Scene Text Recognition: A Dual Correction Mechanism with Deformable Alignment

Abstract

Metrics

Topics

Related Documents

Text Font Correction and Alignment Method for Scene Text Recognition

ViSA: Visual and Semantic Alignment for Robust Scene Text Recognition

Scene Text Recognition With Dual Encoders

Scene Text Recognition via Dual-path Network with Shape-driven Attention Alignment

Towards Accurate Alignment and Sufficient Context in Scene Text Recognition