Multi-stage Semantic Attention with Transformer for Multi-label Image Classification

Qizhen Du; Ying Ma; Jianmin Li

doi:10.2991/978-94-6463-040-4_178

ScienceGate Book Chapters

BOOK-CHAPTER

Multi-stage Semantic Attention with Transformer for Multi-label Image Classification

Qizhen Du Ying Ma Jianmin Li

Year: 2023 Atlantis Highlights in Computer Sciences/Atlantis highlights in computer sciences Pages: 1193-1199 Publisher: Atlantis Press

DOI: 10.2991/978-94-6463-040-4_178

Get Full-Text PDF Get Analytical Report

Abstract

Multi-label image classification is a fundamental classification task, which seeks to assign numerous possible labels to an image.Many deep convolutional neural network (CNN)-based approaches to discovering the semantics of labels and learning the semantic representation of images by modeling label correlation have been proposed in recent years.However, some small and similar objects cannot be predicted accurately due to the limitation of convolutional kernel representation capability.As a result, in order to solve this problem, this paper introduces twins-transformer.Since different stages of image representation of this model capture different levels or scales of features and have different discriminative capacities, we design a multi-stage semantic attention with transformer (MAST) framework to learn the semantic representation of images using its own multi-stage mechanism, while employing a three-layer standard transformer decoder as an effective component for feature fusion.Experiments conducted on the VOC 2007 dataset show that MSAT achieves better experimental results and improves the performance of multi-label image classification tasks to some extent.

Keywords:

Discriminative model Computer science Pattern recognition (psychology) Artificial intelligence Convolutional neural network Transformer Kernel (algebra) Contextual image classification Machine learning Semantic feature Image (mathematics) Mathematics Engineering

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.04

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Text and Document Classification Technologies

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Domain Adaptation and Few-Shot Learning

Physical Sciences → Computer Science → Artificial Intelligence

Multi-stage Semantic Attention with Transformer for Multi-label Image Classification

Abstract

Metrics

Topics

Related Documents

S-MAT: Semantic-Driven Masked Attention Transformer for Multi-Label Aerial Image Classification

Graph Attention Transformer Network for Multi-label Image Classification

DATran: Dual Attention Transformer for Multi-Label Image Classification

A multi-label image classification method combining multi-stage image semantic information and label relevance

Multi-label image classification model based on multi-scale semantic attention and graph attention network