Dual-Dependency Attention Transformer for Fine-Grained Visual Classification

Shiyan Cui; Bin Hui

doi:10.3390/s24072337

ScienceGate Book Chapters

JOURNAL ARTICLE

Dual-Dependency Attention Transformer for Fine-Grained Visual Classification

Shiyan Cui Bin Hui

Year: 2024 Journal: Sensors Vol: 24 (7)Pages: 2337-2337 Publisher: Multidisciplinary Digital Publishing Institute

DOI: 10.3390/s24072337

Get Full-Text PDF Get Analytical Report

Abstract

Visual transformers (ViTs) are widely used in various visual tasks, such as fine-grained visual classification (FGVC). However, the self-attention mechanism, which is the core module of visual transformers, leads to quadratic computational and memory complexity. The sparse-attention and local-attention approaches currently used by most researchers are not suitable for FGVC tasks. These tasks require dense feature extraction and global dependency modeling. To address this challenge, we propose a dual-dependency attention transformer model. It decouples global token interactions into two paths. The first is a position-dependency attention pathway based on the intersection of two types of grouped attention. The second is a semantic dependency attention pathway based on dynamic central aggregation. This approach enhances the high-quality semantic modeling of discriminative cues while reducing the computational cost to linear computational complexity. In addition, we develop discriminative enhancement strategies. These strategies increase the sensitivity of high-confidence discriminative cue tracking with a knowledge-based representation approach. Experiments on three datasets, NABIRDS, CUB, and DOGS, show that the method is suitable for fine-grained image classification. It finds a balance between computational cost and performance.

Keywords:

Discriminative model Computer science Artificial intelligence Transformer Computational complexity theory Machine learning Pattern recognition (psychology) Algorithm

Metrics

Cited By

0.53

FWCI (Field Weighted Citation Impact)

Refs

0.50

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Video Surveillance and Tracking Methods

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Domain Adaptation and Few-Shot Learning

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Dual-Dependency Attention Transformer for Fine-Grained Visual Classification

Abstract

Metrics

Citation History

Topics

Related Documents

Hierarchical attention vision transformer for fine-grained visual classification

Fine-Grained Visual Classification via Adaptive Attention Quantization Transformer

Dual Transformer With Multi-Grained Assembly for Fine-Grained Visual Classification

Havt: Hierarchical Attention Vision Transformer for Fine-Grained Visual Classification

A Transformer Architecture with Adaptive Attention for Fine-Grained Visual Classification