JOURNAL ARTICLE

DATran: Dual Attention Transformer for Multi-Label Image Classification

Wei ZhouZhijie ZhengTao SuHaifeng Hu

Year: 2023 Journal:   IEEE Transactions on Circuits and Systems for Video Technology Vol: 34 (1)Pages: 342-356   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Multi-label image classification is a fundamental yet challenging task, which aims to predict the labels associated with a given image. Most of previous methods directly exploit the high-level features from the last layer of convolutional neural network for classification. However, these methods cannot obtain global features due to the limited size of convolutional kernels, and they fail to extract multi-scale features to effectively recognize small-scale objects in the images. Recent studies exploit the graph convolution network to model the label correlations for boosting the classification performance. Despite substantial progress, these methods rely on manually pre-defined graph structures. Besides, they ignore the associations between semantic labels and image regions, and do not fully explore the spatial context of images. To address above issues, we propose a novel Dual Attention Transformer (DATran) model, which adopts a dual-stream architecture that simultaneously learns spatial and channel correlations from multi-label images. Firstly, in order to solve the problem that current methods are difficult to recognize small-size objects, we develop a new multi-scale feature fusion (MSFF) module to generate multi-scale feature representation by jointly integrating both high-level semantics and low-level details. Secondly, we design a prior-enhanced spatial attention (PSA) module to learn the long-range correlation between objects from different spatial positions in images to enhance the model performance. Thirdly, we devise a prior-enhanced channel attention (PCA) module to capture the inter-dependencies between different channel maps, thus effectively improving the correlation between semantic categories. It is worth noting that PSA module and PCA module complement and promote each other to further augment the feature representations. Finally, the outputs of these two attention modules are fused to obtain the final features for classification. Performance evaluation experiments are conducted on MS-COCO 2014, PASCAL VOC 2007 and VG-500 datasets, demonstrating that DATran model achieves better performance than current state-of-the-art models.

Keywords:
Computer science Artificial intelligence Pattern recognition (psychology) Convolutional neural network Exploit Spatial contextual awareness Graph Feature (linguistics) Contextual image classification Machine learning Image (mathematics) Theoretical computer science

Metrics

17
Cited By
4.34
FWCI (Field Weighted Citation Impact)
87
Refs
0.93
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Graph Attention Transformer Network for Multi-label Image Classification

Jin YuanShikai ChenYao ZhangZhongchao ShiXin GengJianping FanYong Rui

Journal:   ACM Transactions on Multimedia Computing Communications and Applications Year: 2022 Vol: 19 (4)Pages: 1-16
BOOK-CHAPTER

Multi-stage Semantic Attention with Transformer for Multi-label Image Classification

Qizhen DuYing MaJianmin Li

Atlantis Highlights in Computer Sciences/Atlantis highlights in computer sciences Year: 2023 Pages: 1193-1199
JOURNAL ARTICLE

Double Attention for Multi-Label Image Classification

Haiying ZhaoWei ZhouXiaogang HouHui Zhu

Journal:   IEEE Access Year: 2020 Vol: 8 Pages: 225539-225550
© 2026 ScienceGate Book Chapters — All rights reserved.