JOURNAL ARTICLE

Adaptive embedding gate for attention-based scene text recognition

Abstract

Scene text recognition has attracted particular research interest because it is a very challenging problem and has various applications. The most cutting-edge methods are attentional encoder-decoder frameworks that learn the alignment between the input image and output sequences. In particular, the decoder recurrently outputs predictions, using the prediction of the previous step as a guidance for every time step. In this study, we point out that the inappropriate use of previous predictions in existing attentional decoders restricts the recognition performance and brings instability. To handle this problem, we propose a novel module, namely adaptive embedding gate (AEG). The proposed AEG focuses on introducing high-order character language models to attentional decoders by controlling the information transmission between adjacent characters. AEG is a flexible module and can be easily integrated into the state-of-the-art attentional decoders for scene text recognition. We evaluate its effectiveness as well as robustness on a number of standard benchmarks, including the IIIT5K, SVT, SVT-P, CUTE80, and ICDAR datasets. Experimental results demonstrate that AEG can significantly boost recognition performance and bring better robustness.

Keywords:
Computer science Robustness (evolution) Embedding Encoder Artificial intelligence Speech recognition Pattern recognition (psychology)

Metrics

42
Cited By
3.10
FWCI (Field Weighted Citation Impact)
114
Refs
0.93
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Handwritten Text Recognition Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Image Retrieval and Classification Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

SAME-net: scene text recognition method based on soft attention mask embedding

Weida ChenLinfei WangDapeng Tao

Journal:   Journal of Image and Graphics Year: 2024 Vol: 29 (5)Pages: 1381-1391
JOURNAL ARTICLE

SLOAN: Scale-Adaptive Orientation Attention Network for Scene Text Recognition

Pengwen DaiHua ZhangXiaochun Cao

Journal:   IEEE Transactions on Image Processing Year: 2020 Vol: 30 Pages: 1687-1701
JOURNAL ARTICLE

2D Positional Embedding-based Transformer for Scene Text Recognition

Zobeir RaisiMohamed A. NaielPaul FieguthSteven WardellJohn Zelek

Journal:   Journal of Computational Vision and Imaging Systems Year: 2021 Vol: 6 (1)Pages: 1-4
JOURNAL ARTICLE

Flexible scene text recognition based on dual attention mechanism

Zhiqiang TianChunhui WangYouzi XiaoYuping Lin

Journal:   Concurrency and Computation Practice and Experience Year: 2020 Vol: 33 (22)
© 2026 ScienceGate Book Chapters — All rights reserved.