Read, Spell and Repeat: Scene Text Recognition with Vision-Language Circular Refinement

Taiwei Zhang; Zhenghui Hu; Weixin Li; Qingjie Liu; Yunhong Wang

doi:10.1109/icassp48485.2024.10446176

ScienceGate Book Chapters

JOURNAL ARTICLE

Read, Spell and Repeat: Scene Text Recognition with Vision-Language Circular Refinement

Taiwei Zhang Zhenghui Hu Weixin Li Qingjie Liu Yunhong Wang

Year: 2024 Pages: 2720-2724

DOI: 10.1109/icassp48485.2024.10446176

Get Full-Text PDF Get Analytical Report

Abstract

Scene Text Recognition (STR) has long been considered an important yet challenging task in the field of computer vision. Recent works have demonstrated that utilizing language information is effective for the visually difficult images, like ones with occultation or blurring. However, the use of language information sometimes leads to the over-correction problem. For out-of-vocabulary samples (e.g. "hou" and "0x4a"), some methods have tended to be biased to language side and over-corrected (e.g. over-correct "hou" to "hot"). This imbalance of vision and language has limited the usage of models in practical scenarios, yet it is rarely occurs for human. To address this issue, we rethink the human's recognition process and propose a model behaving in the order of "Read, Spell and Repeat". It refines the recognition process circularly with vision and language information. With this mechanism, our model integrates vision and language information in a more effective manner, achieving higher accuracy with less parameters compared to baseline and competitive performance with SOTA methods in the standard benchmarks.

Keywords:

Computer science Spell Vocabulary Language model Artificial intelligence Natural language processing Process (computing) Task (project management) Speech recognition Baseline (sea) Computer vision Linguistics

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.03

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Handwritten Text Recognition Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Image Processing and 3D Reconstruction

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Image Retrieval and Classification Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Read, Spell and Repeat: Scene Text Recognition with Vision-Language Circular Refinement

Abstract

Metrics

Topics

Related Documents

Dynamic Language Refinement for Graceful Degradation in Scene Text Recognition

Active scene recognition with vision and language

Efficient Scene text localization and recognition with local character refinement

CLIP4STR: A Simple Baseline for Scene Text Recognition With Pre-Trained Vision-Language Model

Compressed Vision Transformer for Scene Text Recognition