JOURNAL ARTICLE

Dual-Dependency Attention Transformer for Fine-Grained Visual Classification

Shiyan CuiBin Hui

Year: 2024 Journal:   Sensors Vol: 24 (7)Pages: 2337-2337   Publisher: Multidisciplinary Digital Publishing Institute

Abstract

Visual transformers (ViTs) are widely used in various visual tasks, such as fine-grained visual classification (FGVC). However, the self-attention mechanism, which is the core module of visual transformers, leads to quadratic computational and memory complexity. The sparse-attention and local-attention approaches currently used by most researchers are not suitable for FGVC tasks. These tasks require dense feature extraction and global dependency modeling. To address this challenge, we propose a dual-dependency attention transformer model. It decouples global token interactions into two paths. The first is a position-dependency attention pathway based on the intersection of two types of grouped attention. The second is a semantic dependency attention pathway based on dynamic central aggregation. This approach enhances the high-quality semantic modeling of discriminative cues while reducing the computational cost to linear computational complexity. In addition, we develop discriminative enhancement strategies. These strategies increase the sensitivity of high-confidence discriminative cue tracking with a knowledge-based representation approach. Experiments on three datasets, NABIRDS, CUB, and DOGS, show that the method is suitable for fine-grained image classification. It finds a balance between computational cost and performance.

Keywords:
Discriminative model Computer science Artificial intelligence Transformer Computational complexity theory Machine learning Pattern recognition (psychology) Algorithm

Metrics

1
Cited By
0.53
FWCI (Field Weighted Citation Impact)
71
Refs
0.50
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Video Surveillance and Tracking Methods
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Hierarchical attention vision transformer for fine-grained visual classification

Xiaobin HuShining ZhuTaile Peng

Journal:   Journal of Visual Communication and Image Representation Year: 2023 Vol: 91 Pages: 103755-103755
JOURNAL ARTICLE

Fine-Grained Visual Classification via Adaptive Attention Quantization Transformer

Shishi QiaoS. H. LiHaiyong Zheng

Journal:   IEEE Transactions on Neural Networks and Learning Systems Year: 2025 Vol: PP Pages: 1-15
JOURNAL ARTICLE

Dual Transformer With Multi-Grained Assembly for Fine-Grained Visual Classification

Ruyi JiJiaying LiLibo ZhangJing LiuYanjun Wu

Journal:   IEEE Transactions on Circuits and Systems for Video Technology Year: 2023 Vol: 33 (9)Pages: 5009-5021
JOURNAL ARTICLE

A Transformer Architecture with Adaptive Attention for Fine-Grained Visual Classification

Changli CaiTiankui ZhangZhewei WengChunyan FengYapeng Wang

Journal:   2021 7th International Conference on Computer and Communications (ICCC) Year: 2021 Pages: 863-867
© 2026 ScienceGate Book Chapters — All rights reserved.