Outline Generation Transformer for Bilingual Scene Text Recognition

Jui-Teng Ho; Gee-Sern Hsu; Svetlana Yanushkevich; Marina L. Gavrilova

doi:10.23919/mva57639.2023.10216107

ScienceGate Book Chapters

JOURNAL ARTICLE

Outline Generation Transformer for Bilingual Scene Text Recognition

Jui-Teng Ho Gee-Sern Hsu Svetlana Yanushkevich Marina L. Gavrilova

Year: 2023 Pages: 1-5

DOI: 10.23919/mva57639.2023.10216107

Get Full-Text PDF Get Analytical Report

Abstract

We propose the Outline Generation Transformer (OGT) for bilingual Scene Text Recognition (STR). As most STR approaches focus on English, we consider both English and Chinese as Chinese is also a major language, and it is a common scene in many areas/countries where both languages can be seen. The OGT consists of an Outline Generator (OG) and a transformer with a language model embedded. The OG detects the character outline of the text and embeds the outline features into a transformer with the outline-query cross-attention layer to better locate each character and enhance the text recognition performance. The training of OGT has two phases, one is training on synthetic data where the text outline masks are made available, followed by the other training on real data where the text outline masks can only be estimated. The proposed OGT is evaluated on several benchmark datasets and compared with state-of-the-art methods.

Keywords:

Transformer Computer science Natural language processing Artificial intelligence Benchmark (surveying) Speech recognition Engineering

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.10

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Handwritten Text Recognition Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Image Processing and 3D Reconstruction

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Outline Generation Transformer for Bilingual Scene Text Recognition

Abstract

Metrics

Topics

Related Documents

Display-Semantic Transformer for Scene Text Recognition

Lightweight Scene Text Recognition Based on Transformer

Compressed Vision Transformer for Scene Text Recognition

An End-to-End Scene Text Recognition for Bilingual Text

Improving transformer for scene text and handwritten text recognition