A Sequence-selective Fine-grained Image Recognition Strategy Using Vision Transformer

Yulin Cai; Haoqian Wang; Xingzheng Wang

doi:10.1109/ist55454.2022.9827667

ScienceGate Book Chapters

JOURNAL ARTICLE

A Sequence-selective Fine-grained Image Recognition Strategy Using Vision Transformer

Yulin Cai Haoqian Wang Xingzheng Wang

Year: 2022 Pages: 1-6

DOI: 10.1109/ist55454.2022.9827667

Get Full-Text PDF Get Analytical Report

Abstract

Aiming at precise sub-category classification of images, fine-grained image recognition requires the algorithms to enjoy a remarkable ability of subtle feature extraction. Recently, the architecture of Transformer has been successfully applied in vision tasks, bringing a novel approach to improve feature extraction performance of fine-grained image recognition algorithms. However, fine-grained image datasets are usually quite limited in capacity, which are unfavorable for the data-consuming training process of Transformers. In order to increase the available amount of data for training, in this paper we firstly introduce a stochastic image data augmentation method for Vision Transformer (ViT), which uses a Dense-DETR model to extract feature regions and performs random insertion and removal for the transformed patch sequence. To select the most informative sequence elements in the forward propagation pro-cess, we implement a feature patch selection strategy by applying an additional convolutional network structure to ViT encoders. Inspired from active learning, a contrastive loss utilizing the posterior information of paired images is also introduced as a penalty item of ViT's cross-entropy loss objective. Such strategies can make the ViT extract the most discriminative feature information from its input. Extensive experiments have supported that the proposed sequence-selective Vision Transformer reaches the highest recognition accuracies on several frequently-used fine-grained image datasets.

Keywords:

Computer science Artificial intelligence Discriminative model Feature extraction Pattern recognition (psychology) Transformer Encoder Convolutional neural network Feature learning Entropy (arrow of time) Computer vision Engineering

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.06

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Advanced Neural Network Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Medical Image Segmentation Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

A Sequence-selective Fine-grained Image Recognition Strategy Using Vision Transformer

Abstract

Metrics

Topics

Related Documents

Token-Selective Vision Transformer for fine-grained image recognition of marine organisms

Multi-Exit Vision Transformer with Custom Fine-Tuning for Fine-Grained Image Recognition

MULTI-EXIT VISION TRANSFORMER WITH CUSTOM FINE-TUNING FOR FINE-GRAINED IMAGE RECOGNITION

Token Adaptive Vision Transformer with Efficient Deployment for Fine-Grained Image Recognition

Hybrid Granularities Transformer for Fine-Grained Image Recognition