Multi-Exit Vision Transformer with Custom Fine-Tuning for Fine-Grained Image Recognition

Tianyi Shen; Chonghan Lee; Vijaykrishnan Narayanan

doi:10.1109/icip49359.2023.10222298

ScienceGate Book Chapters

JOURNAL ARTICLE

Multi-Exit Vision Transformer with Custom Fine-Tuning for Fine-Grained Image Recognition

Tianyi Shen Chonghan Lee Vijaykrishnan Narayanan

Year: 2023 Pages: 2830-2834

DOI: 10.1109/icip49359.2023.10222298

Get Full-Text PDF Get Analytical Report

Abstract

Capturing subtle visual differences between subordinate categories is crucial for improving the performance of Fine-grained Visual Classification (FGVC). Recent works proposed deep learning models based on Vision Transformer (ViT) to take advantage of its self-attention mechanism to locate important regions of the objects and extract global information. However, their large number of layers with self-attention mechanism requires intensive computational cost and makes them impractical to be deployed on resource-restricted hardware including internet of things (IoT) devices. In this work, we propose a novel Multi-exit Vision Transformer architecture (MEViT) for early exiting based on ViT, as well as a fine-tuning strategy that involves self-distillation to improve the accuracy of early exit branches on FGVC task compared to the baseline ViT model. The experiments on two standard FGVC benchmarks show our proposed model provides superior accuracy-efficiency trade-offs compared to the state-of-the-art (SOTA) ViT-based model and demonstrate that it is possible to accurately classify many subcategories with significantly less effort.

Keywords:

Computer science Transformer Artificial intelligence Deep learning Reinforcement learning Architecture Machine learning Computer vision Engineering Voltage

Metrics

Cited By

0.17

FWCI (Field Weighted Citation Impact)

Refs

0.45

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

CCD and CMOS Imaging Sensors

Physical Sciences → Engineering → Electrical and Electronic Engineering

Advanced Neural Network Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Visual Attention and Saliency Detection

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Multi-Exit Vision Transformer with Custom Fine-Tuning for Fine-Grained Image Recognition

Abstract

Metrics

Citation History

Topics

Related Documents

MULTI-EXIT VISION TRANSFORMER WITH CUSTOM FINE-TUNING FOR FINE-GRAINED IMAGE RECOGNITION

A Multi-Stage Vision Transformer for Fine-grained Image Classification

Token Adaptive Vision Transformer with Efficient Deployment for Fine-Grained Image Recognition

A Sequence-selective Fine-grained Image Recognition Strategy Using Vision Transformer

Token-Selective Vision Transformer for fine-grained image recognition of marine organisms