JOURNAL ARTICLE

Apvit: ViT with adaptive patches for scene text recognition

Abstract

Abstract Scene texts in nature exhibit varied colors, which serve as a significant distinguishing feature that effectively suppresses background interference. In this study, color clustering is utilized as a prior guide to group patches, enhancing their spatial relationships. Additionally, patch sizes are adaptively adjusted during training to balance speed and accuracy, while unimportant tokens and blocks in the model are pruned. We propose APViT, which modifies the ViTs model for scene text recognition requirements. It consists of three components: Sparse Patches Selection (SPS), ViT-STR, and Token Code (TC). First, SPS segments images into appropriate patches and clusters similar ones to explore diverse local patches adaptively. Second, we enhance the ViTs model specifically for scene text recognition as ViT-STR. Finally, TC prunes non-essential parts of the network based on self-attention mechanisms to accelerate performance. Consequently, our proposed APViT model outperforms state-of-the-art methods across several datasets, demonstrating its effectiveness.

Keywords:
Computer science Artificial intelligence Computer vision Pattern recognition (psychology) Speech recognition

Metrics

1
Cited By
4.77
FWCI (Field Weighted Citation Impact)
30
Refs
0.82
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Handwritten Text Recognition Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Image Processing and 3D Reconstruction
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Scene Text Recognition with Transformer using Multi-patches

Yao WangJong-Eun Ha

Journal:   Journal of Institute of Control Robotics and Systems Year: 2022 Vol: 28 (10)Pages: 862-867
JOURNAL ARTICLE

Adaptive Histogram Analysis for Scene Text Binarization and Recognition

M BasavannaPalaiahnakote ShivakumaraS. K. SrivatsaG. Hemantha Kumar

Journal:   Malaysian Journal of Computer Science Year: 2016 Vol: 29 (2)Pages: 74-85
© 2026 ScienceGate Book Chapters — All rights reserved.