Apvit: ViT with adaptive patches for scene text recognition

Ning Zhang; Ce Li; Zongshun Wang; Jialin Ma; Zhiqiang Feng

doi:10.1007/s42452-025-06570-9

ScienceGate Book Chapters

JOURNAL ARTICLE

Apvit: ViT with adaptive patches for scene text recognition

Ning Zhang Ce Li Zongshun Wang Jialin Ma Zhiqiang Feng

Year: 2025 Journal: Discover Applied Sciences Vol: 7 (4)

DOI: 10.1007/s42452-025-06570-9

Get Full-Text PDF Get Analytical Report

Abstract

Abstract Scene texts in nature exhibit varied colors, which serve as a significant distinguishing feature that effectively suppresses background interference. In this study, color clustering is utilized as a prior guide to group patches, enhancing their spatial relationships. Additionally, patch sizes are adaptively adjusted during training to balance speed and accuracy, while unimportant tokens and blocks in the model are pruned. We propose APViT, which modifies the ViTs model for scene text recognition requirements. It consists of three components: Sparse Patches Selection (SPS), ViT-STR, and Token Code (TC). First, SPS segments images into appropriate patches and clusters similar ones to explore diverse local patches adaptively. Second, we enhance the ViTs model specifically for scene text recognition as ViT-STR. Finally, TC prunes non-essential parts of the network based on self-attention mechanisms to accelerate performance. Consequently, our proposed APViT model outperforms state-of-the-art methods across several datasets, demonstrating its effectiveness.

Keywords:

Computer science Artificial intelligence Computer vision Pattern recognition (psychology) Speech recognition

Metrics

Cited By

4.77

FWCI (Field Weighted Citation Impact)

Refs

0.82

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Handwritten Text Recognition Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Image Processing and 3D Reconstruction

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Apvit: ViT with adaptive patches for scene text recognition

Abstract

Metrics

Citation History

Topics

Related Documents

Scene Text Recognition with Transformer using Multi-patches

Adaptive Importance Pooling Network for Scene Text Recognition

Robust Scene Text Recognition Through Adaptive Image Enhancement

Adaptive Histogram Analysis for Scene Text Binarization and Recognition

Adaptive embedding gate for attention-based scene text recognition