JOURNAL ARTICLE

ESIR: End-To-End Scene Text Recognition via Iterative Image Rectification

Abstract

Automated recognition of texts in scenes has been a research challenge for years, largely due to the arbitrary text appearance variation in perspective distortion, text line curvature, text styles and different types of imaging artifacts. The recent deep networks are capable of learning robust representations with respect to imaging artifacts and text style changes, but still face various problems while dealing with scene texts with perspective and curvature distortions. This paper presents an end-to-end trainable scene text recognition system (ESIR) that iteratively removes perspective distortion and text line curvature as driven by better scene text recognition performance. An innovative rectification network is developed, where a line-fitting transformation is designed to estimate the pose of text lines in scenes. Additionally, an iterative rectification framework is developed which corrects scene text distortions iteratively towards a fronto-parallel view. The ESIR is also robust to parameter initialization and easy to train, where the training needs only scene text images and word-level annotations as required by most scene text recognition systems. Extensive experiments over a number of public datasets show that the proposed ESIR is capable of rectifying scene text distortions accurately, achieving superior recognition performance for both normal scene text images and those suffering from perspective and curvature distortions.

Keywords:
Perspective distortion Initialization Computer science Artificial intelligence Perspective (graphical) Distortion (music) Image rectification Curvature Computer vision Rectification Line (geometry) Pattern recognition (psychology) Image (mathematics) Mathematics Geometry

Metrics

327
Cited By
25.55
FWCI (Field Weighted Citation Impact)
60
Refs
1.00
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Handwritten Text Recognition Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Image Processing and 3D Reconstruction
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Vehicle License Plate Recognition
Physical Sciences →  Engineering →  Media Technology

Related Documents

BOOK-CHAPTER

End-to-End Scene Text Recognition Network with Adaptable Text Rectification

Yi ZhangZhiwen LiLei GuoWenbi Rao

Lecture notes on data engineering and communications technologies Year: 2021 Pages: 175-184
JOURNAL ARTICLE

Transformer-based end-to-end scene text recognition

Xinghao ZhuZhi Zhang

Year: 2021 Vol: 19 Pages: 1691-1695
JOURNAL ARTICLE

An End-to-End Scene Text Recognition for Bilingual Text

Bayan M. AlbalawiAmani JamalLama Al KhuzayemOlaa A. Alsaedi

Journal:   Big Data and Cognitive Computing Year: 2024 Vol: 8 (9)Pages: 117-117
© 2026 ScienceGate Book Chapters — All rights reserved.