JOURNAL ARTICLE

A Sequence-selective Fine-grained Image Recognition Strategy Using Vision Transformer

Abstract

Aiming at precise sub-category classification of images, fine-grained image recognition requires the algorithms to enjoy a remarkable ability of subtle feature extraction. Recently, the architecture of Transformer has been successfully applied in vision tasks, bringing a novel approach to improve feature extraction performance of fine-grained image recognition algorithms. However, fine-grained image datasets are usually quite limited in capacity, which are unfavorable for the data-consuming training process of Transformers. In order to increase the available amount of data for training, in this paper we firstly introduce a stochastic image data augmentation method for Vision Transformer (ViT), which uses a Dense-DETR model to extract feature regions and performs random insertion and removal for the transformed patch sequence. To select the most informative sequence elements in the forward propagation pro-cess, we implement a feature patch selection strategy by applying an additional convolutional network structure to ViT encoders. Inspired from active learning, a contrastive loss utilizing the posterior information of paired images is also introduced as a penalty item of ViT's cross-entropy loss objective. Such strategies can make the ViT extract the most discriminative feature information from its input. Extensive experiments have supported that the proposed sequence-selective Vision Transformer reaches the highest recognition accuracies on several frequently-used fine-grained image datasets.

Keywords:
Computer science Artificial intelligence Discriminative model Feature extraction Pattern recognition (psychology) Transformer Encoder Convolutional neural network Feature learning Entropy (arrow of time) Computer vision Engineering

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
36
Refs
0.06
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Medical Image Segmentation Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.