Towards Accurate Video Text Spotting with Text-wise Semantic Reasoning

Xinyan Zu; Haiyang Yu; Bin Li; Xiangyang Xue

doi:10.24963/ijcai.2023/206

ScienceGate Book Chapters

JOURNAL ARTICLE

Towards Accurate Video Text Spotting with Text-wise Semantic Reasoning

Xinyan Zu Haiyang Yu Bin Li Xiangyang Xue

Year: 2023 Pages: 1858-1866

DOI: 10.24963/ijcai.2023/206

Get Full-Text PDF Get Analytical Report

Abstract

Video text spotting (VTS) aims at extracting texts from videos, where text detection, tracking and recognition are conducted simultaneously. There have been some works that can tackle VTS; however, they may ignore the underlying semantic relationships among texts within a frame. We observe that the texts within a frame usually share similar semantics, which suggests that, if one text is predicted incorrectly by a text recognizer, it still has a chance to be corrected via semantic reasoning. In this paper, we propose an accurate video text spotter, VLSpotter, that reads texts visually, linguistically, and semantically. For ‘visually’, we propose a plug-and-play text-focused super-resolution module to alleviate motion blur and enhance video quality. For ‘linguistically’, a language model is employed to capture intra-text context to mitigate wrongly spelled text predictions. For ‘semantically’, we propose a text-wise semantic reasoning module to model inter-text semantic relationships and reason for better results. The experimental results on multiple VTS benchmarks demonstrate that the proposed VLSpotter outperforms the existing state-of-the-art methods in end-to-end video text spotting.

Keywords:

Computer science Spotting Artificial intelligence Natural language processing Semantics (computer science) Context (archaeology) Frame (networking) Text detection Information retrieval Image (mathematics) Programming language

Metrics

Cited By

0.18

FWCI (Field Weighted Citation Impact)

Refs

0.41

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Handwritten Text Recognition Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Video Analysis and Summarization

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Vehicle License Plate Recognition

Physical Sciences → Engineering → Media Technology

Towards Accurate Video Text Spotting with Text-wise Semantic Reasoning

Abstract

Metrics

Citation History

Topics

Related Documents

Towards Accurate Scene Text Recognition With Semantic Reasoning Networks

Towards Accurate Instance-Level Text Spotting with Guided Attention

Exploiting Visual Semantic Reasoning for Video-Text Retrieval

End-to-End Video Text Spotting with Transformer

Visual Semantic Re-ranker for Text Spotting