Visual-Semantic Refinement Network: Towards Exploring the Capabilities of Decoder in Scene Text Recognition

Yingtao Tan; Yingying Chen; Jinqiao Wang

doi:10.1109/ieir59294.2023.10391251

ScienceGate Book Chapters

JOURNAL ARTICLE

Visual-Semantic Refinement Network: Towards Exploring the Capabilities of Decoder in Scene Text Recognition

Yingtao Tan Yingying Chen Jinqiao Wang

Year: 2023 Pages: 1-8

DOI: 10.1109/ieir59294.2023.10391251

Get Full-Text PDF Get Analytical Report

Abstract

Traditional scene text recognition (STR) is usually regarded as a visual unimodal recognition task, which has made some progress using the encoder-decoder framework. Introducing the language model (LM) that taps into semantic contextual relationships has significantly promoted the task from the language modality. However, in existing works, LM seriously relies on the output of the decoder in the vision model (VM), and the vision decoder itself lacks semantic and global context awareness. In this paper, we explore the capability of the vision decoder, which is generally ignored in previous works. We propose a Visual-Semantic Refinement Network (VSRN) to provide context and semantic guidance to the decoder, fully supporting the recognition capability. With the semantic refine module, the recognition results in the LM, in return, can be introduced to the VM. It provides semantic information while further facilitating the union of these two modalities. In the visual refinement module, we propose an adaptive mask strategy and explore visual features' global contextual relationships to assist the VM further. The two complementary clues jointly promote the VM and iteratively improve the recognition performance. Experimental results on several scene text recognition benchmarks show that our proposed method is effective and achieves state-of-the-art performance.

Keywords:

Computer science Artificial intelligence Natural language processing Text recognition Information retrieval Image (mathematics)

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.19

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Handwritten Text Recognition Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Image Retrieval and Classification Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Image Processing and 3D Reconstruction

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Visual-Semantic Refinement Network: Towards Exploring the Capabilities of Decoder in Scene Text Recognition

Abstract

Metrics

Topics

Related Documents

Visual-Semantic Dual-Decoder Collaboration for Scene Text Recognition

Hierarchical visual-semantic interaction for scene text recognition

Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition

Multimodal Visual-Semantic Representations Learning for Scene Text Recognition

Scene text recognition via dual character counting-aware visual and semantic modeling network