DATran: Dual Attention Transformer for Multi-Label Image Classification

Wei Zhou; Zhijie Zheng; Tao Su; Haifeng Hu

doi:10.1109/tcsvt.2023.3284812

ScienceGate Book Chapters

JOURNAL ARTICLE

DATran: Dual Attention Transformer for Multi-Label Image Classification

Wei Zhou Zhijie Zheng Tao Su Haifeng Hu

Year: 2023 Journal: IEEE Transactions on Circuits and Systems for Video Technology Vol: 34 (1)Pages: 342-356 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/tcsvt.2023.3284812

Get Full-Text PDF Get Analytical Report

Abstract

Multi-label image classification is a fundamental yet challenging task, which aims to predict the labels associated with a given image. Most of previous methods directly exploit the high-level features from the last layer of convolutional neural network for classification. However, these methods cannot obtain global features due to the limited size of convolutional kernels, and they fail to extract multi-scale features to effectively recognize small-scale objects in the images. Recent studies exploit the graph convolution network to model the label correlations for boosting the classification performance. Despite substantial progress, these methods rely on manually pre-defined graph structures. Besides, they ignore the associations between semantic labels and image regions, and do not fully explore the spatial context of images. To address above issues, we propose a novel Dual Attention Transformer (DATran) model, which adopts a dual-stream architecture that simultaneously learns spatial and channel correlations from multi-label images. Firstly, in order to solve the problem that current methods are difficult to recognize small-size objects, we develop a new multi-scale feature fusion (MSFF) module to generate multi-scale feature representation by jointly integrating both high-level semantics and low-level details. Secondly, we design a prior-enhanced spatial attention (PSA) module to learn the long-range correlation between objects from different spatial positions in images to enhance the model performance. Thirdly, we devise a prior-enhanced channel attention (PCA) module to capture the inter-dependencies between different channel maps, thus effectively improving the correlation between semantic categories. It is worth noting that PSA module and PCA module complement and promote each other to further augment the feature representations. Finally, the outputs of these two attention modules are fused to obtain the final features for classification. Performance evaluation experiments are conducted on MS-COCO 2014, PASCAL VOC 2007 and VG-500 datasets, demonstrating that DATran model achieves better performance than current state-of-the-art models.

Keywords:

Computer science Artificial intelligence Pattern recognition (psychology) Convolutional neural network Exploit Spatial contextual awareness Graph Feature (linguistics) Contextual image classification Machine learning Image (mathematics) Theoretical computer science

Metrics

Cited By

4.34

FWCI (Field Weighted Citation Impact)

Refs

0.93

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Text and Document Classification Technologies

Physical Sciences → Computer Science → Artificial Intelligence

Domain Adaptation and Few-Shot Learning

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

DATran: Dual Attention Transformer for Multi-Label Image Classification

Abstract

Metrics

Citation History

Topics

Related Documents

Graph Attention Transformer Network for Multi-label Image Classification

Multi-stage Semantic Attention with Transformer for Multi-label Image Classification

Hypergraph Transformer for Multi-Label Image Classification

Double Attention for Multi-Label Image Classification

Visual Attention in Multi-Label Image Classification