JOURNAL ARTICLE

Convolutional Attention Networks for Scene Text Recognition

Hongtao XieShancheng FangZheng-Jun ZhaYating YangYan LiYongdong Zhang

Year: 2019 Journal:   ACM Transactions on Multimedia Computing Communications and Applications Vol: 15 (1s)Pages: 1-17   Publisher: Association for Computing Machinery

Abstract

In this article, we present Convoluitional Attention Networks (CAN) for unconstrained scene text recognition. Recent dominant approaches for scene text recognition are mainly based on Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN), where the CNN encodes images and the RNN generates character sequences. Our CAN is different from these methods; our CAN is completely built on CNN and includes an attention mechanism. The distinctive characteristics of our method include (i) CAN follows encoder-decoder architecture, in which the encoder is a deep two-dimensional CNN and the decoder is a one-dimensional CNN; (ii) the attention mechanism is applied in every convolutional layer of the decoder, and we propose a novel spatial attention method using average pooling; and (iii) position embeddings are equipped in both a spatial encoder and a sequence decoder to give our networks a sense of location. We conduct experiments on standard datasets for scene text recognition, including Street View Text , IIIT5K, and ICDAR datasets. The experimental results validate the effectiveness of different components and show that our convolutional-based method achieves state-of-the-art or competitive performance over prior works, even without the use of RNN.

Keywords:
Computer science Pooling Convolutional neural network Artificial intelligence Encoder Pattern recognition (psychology) Recurrent neural network Deep learning Convolutional code Layer (electronics) Sequence (biology) Decoding methods Artificial neural network Algorithm

Metrics

85
Cited By
7.16
FWCI (Field Weighted Citation Impact)
34
Refs
0.98
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Handwritten Text Recognition Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Image Processing and 3D Reconstruction
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Vehicle License Plate Recognition
Physical Sciences →  Engineering →  Media Technology
© 2026 ScienceGate Book Chapters — All rights reserved.