JOURNAL ARTICLE

Multi-Scale Channel Attention for Chinese Scene Text Recognition

Abstract

Scene text recognition have proven to be highly effective in solving various computer vision tasks. Recently, numerous recognition algorithms based on the encoder-decoder framework have been proposed for handling scene texts with perspective distortion and curve shape. Nevertheless, most of these methods only consider single-scale features while not taking multi-scale features into account. Meanwhile, the existing text recognition methods are mainly used for English texts, whereas ignoring Chinese texts' pivotal role. In this paper, we proposed an end-to-end method to integrate multi-scale features for Chinese scene text recognition (CSTR). Specifically, we adopted and customized the Dense Atrous Spatial Pyramid Pooling (DenseASPP) to our backbone network to capture multi-scale features of the input image while simultaneously extending the receptive fields. Moreover, we added Squeeze-and-Excitation Networks (SE) to capture attentional features with global information to improve the performance of CSTR further. The experimental results of the Chinese scene text datasets demonstrate that the proposed method can efficiently mitigate the impacts of the loss of contextual information caused by the text scale varying and outperforms the state-of-the-art approaches.

Keywords:
Computer science Pooling Artificial intelligence Pyramid (geometry) Scale (ratio) Distortion (music) Perspective (graphical) Text recognition Encoder Channel (broadcasting) Feature (linguistics) Pattern recognition (psychology) Perspective distortion Image (mathematics) Computer vision Speech recognition

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
44
Refs
0.17
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Handwritten Text Recognition Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Image Retrieval and Classification Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Vehicle License Plate Recognition
Physical Sciences →  Engineering →  Media Technology

Related Documents

JOURNAL ARTICLE

Multi-scene ancient chinese text recognition

Kaili WangYaohua YiJunjie LiuLiqiong LuYing Song

Journal:   Neurocomputing Year: 2019 Vol: 377 Pages: 64-72
JOURNAL ARTICLE

MS-ROCANet: Multi-Scale Residual Orthogonal-Channel Attention Network for Scene Text Detection

Jinpeng LiuSong WuDehong HeGuoqiang Xiao

Journal:   ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Year: 2022 Pages: 2200-2204
JOURNAL ARTICLE

Attention Guided Multi-Scale Regression for Scene Text Detection

Ge Huang

Journal:   2021 2nd International Conference on Computing and Data Science (CDS) Year: 2021 Pages: 498-502
© 2026 ScienceGate Book Chapters — All rights reserved.