Enhancing Scene Text Recognition with Encoder–Decoder Interactive Model

Yunfei Mu; Mieradilijiang Maimaiti; Miaomiao Xu; Wenkai Li; Wushour Silamu

doi:10.3390/s25247684

ScienceGate Book Chapters

JOURNAL ARTICLE

Enhancing Scene Text Recognition with Encoder–Decoder Interactive Model

Yunfei Mu Mieradilijiang Maimaiti Miaomiao Xu Wenkai Li Wushour Silamu

Year: 2025 Journal: Sensors Vol: 25 (24)Pages: 7684-7684 Publisher: Multidisciplinary Digital Publishing Institute

DOI: 10.3390/s25247684

Get Full-Text PDF Get Analytical Report

Abstract

Scene text recognition has significant application value in autonomous driving, smart retail, and assistive devices. However, due to challenges such as multi-scale variations, distortions, and complex backgrounds, existing methods such as CRNN, ViT, and PARSeq, while showing good performance, still have room for improvement in feature extraction and semantic modeling capabilities. To address these issues, this paper proposes a novel scene text recognition model named the Encoder–Decoder Interactive Model (EDIM). Based on an encoder–decoder framework, EDIM introduces a Multi-scale Dilated Fusion Attention (MSFA) module in the encoder to enhance multi-scale feature representation. In the decoder, a Sequential Encoder–Decoder Context Fusion (SeqEDCF) mechanism is designed to enable efficient semantic interaction between the encoder and decoder. The effectiveness of the proposed method is validated on six regular and irregular benchmark test sets, as well as various subsets of the Union14M-L dataset. Experimental results demonstrate that EDIM outperforms state-of-the-art (SOTA) methods across multiple metrics, achieving significant performance gains, especially in recognizing irregular and distorted text.

Keywords:

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Enhancing Scene Text Recognition with Encoder–Decoder Interactive Model

Abstract

Metrics

Topics

Related Documents

Natural Scene Text Recognition Based on Encoder-Decoder Framework

HRNet Encoder and Dual-Branch Decoder Framework-Based Scene Text Recognition Model

SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

PIEED: Position information enhanced encoder-decoder framework for scene text recognition

Representation and Correlation Enhanced Encoder-Decoder Framework for Scene Text Recognition