JOURNAL ARTICLE

Fine-Grained Language Identification in Scene Text Images

Abstract

Identifying the language of the text in scene images is crucial for various applications. Studies that focus on identifying the script, which is a set of letters used for writing in a given language, in scene text images already exist. However, these works do not distinguish between different languages written in the same script and are thus unable to meet the needs of many applications. To address this challenge, we study a novel task: fine-grained language identification in scene text images, which aims to distinguish languages that share the same script. The datasets that include samples in seven languages, which are Dutch, English, French, Italian, German, Spanish, and Portuguese, are constructed. Furthermore, well-designed end-to-end trainable neural networks are proposed for fine-grained language identification, where semantic information concerning the text is mined and utilized to assist the language identification. We train the networks on the synthetic dataset and evaluate them with the collected real dataset. The experimental results demonstrate that the proposed frameworks are effective.

Keywords:
Computer science Natural language processing Artificial intelligence Identification (biology) Task (project management) Focus (optics) Set (abstract data type) German Language identification Portuguese Natural language Linguistics Programming language

Metrics

5
Cited By
0.51
FWCI (Field Weighted Citation Impact)
47
Refs
0.66
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Handwritten Text Recognition Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Authorship Attribution and Profiling
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.