JOURNAL ARTICLE

Fine-Grained Visual Classification via Adaptive Attention Quantization Transformer

Shishi QiaoS. H. LiHaiyong Zheng

Year: 2025 Journal:   IEEE Transactions on Neural Networks and Learning Systems Vol: PP Pages: 1-15   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Vision transformer (ViT) has recently demonstrated remarkable performance in fine-grained visual classification (FGVC). However, most existing ViT-based methods often overlook the varied focus of different attention heads, in which heads that attend to nondiscriminative regions would dilute the discriminative signal crucial for FGVC. To address such issues, we propose a novel adaptive attention quantization transformer (A2QTrans) for FGVC to select the key discriminative features by analyzing the heads' attention, which comprises three key modules: the adaptive quantization selection (AQS) module, the background elimination (BE) module, and the dynamic hybrid optimization (DHO) module. Specifically, the AQS module dynamically selects the most discriminative features in a data-driven manner by quantizing the attention scores across multiple attention heads with a global, learnable threshold. This process effectively filters out generally irrelevant information from nondiscriminative tokens, thus concentrating attention on important regions. To address the nondifferentiability inherent in updating this threshold during binarization, our AQS module employs a straight-through estimator (STE) for discrete optimization, enabling end-to-end gradient backpropagation. In addition, we utilize the prior that background regions usually do not contain meaningful information, and design the BE module to further calibrate the focus of the attention heads to the main objects in images. Finally, the DHO module adaptively optimizes and integrates the attentive results of the AQS and BE modules to achieve optimal classification performance. Extensive experiments conducted on four challenging FGVC benchmark datasets and three ViT variants demonstrate A2QTrans's superior performance, achieving state-of-the-art (SOTA) results. The source code is available at https://github.com/Lishixian0817/A2QTrans.

Keywords:

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Related Documents

JOURNAL ARTICLE

A Transformer Architecture with Adaptive Attention for Fine-Grained Visual Classification

Changli CaiTiankui ZhangZhewei WengChunyan FengYapeng Wang

Journal:   2021 7th International Conference on Computer and Communications (ICCC) Year: 2021 Pages: 863-867
JOURNAL ARTICLE

Hierarchical attention vision transformer for fine-grained visual classification

Xiaobin HuShining ZhuTaile Peng

Journal:   Journal of Visual Communication and Image Representation Year: 2023 Vol: 91 Pages: 103755-103755
JOURNAL ARTICLE

Dual-Dependency Attention Transformer for Fine-Grained Visual Classification

Shiyan CuiBin Hui

Journal:   Sensors Year: 2024 Vol: 24 (7)Pages: 2337-2337
© 2026 ScienceGate Book Chapters — All rights reserved.