Channel and spatial attention mechanism for fashion image captioning

Bao T. Nguyen; Son T. Nguyen; Anh H. Vo

doi:10.11591/ijece.v13i5.pp5833-5842

ScienceGate Book Chapters

JOURNAL ARTICLE

Channel and spatial attention mechanism for fashion image captioning

Bao T. Nguyen Son T. Nguyen Anh H. Vo

Year: 2023 Journal: International Journal of Power Electronics and Drive Systems/International Journal of Electrical and Computer Engineering Vol: 13 (5)Pages: 5833-5833 Publisher: Institute of Advanced Engineering and Science (IAES)

DOI: 10.11591/ijece.v13i5.pp5833-5842

Get Full-Text PDF Get Analytical Report

Abstract

<div class="page" title="Page 1"><div class="layoutArea"><div class="column"><p><span lang="EN-US">Image captioning aims to automatically generate one or more description sentences for a given input image. Most of the existing captioning methods use encoder-decoder model which mainly focus on recognizing and capturing the relationship between objects appearing in the input image. However, when generating captions for fashion images, it is important to not only describe the items and their relationships, but also mention attribute features of clothes (shape, texture, style, fabric, and more). In this study, one novel model is proposed for fashion image captioning task which can capture not only the items and their relationship, but also their attribute features. Two different attention mechanisms (spatial-attention and channel-wise attention) is incorporated to the traditional encoder-decoder model, which dynamically interprets the caption sentence in multi-layer feature map in addition to the depth dimension of the feature map. We evaluate our proposed architecture on Fashion-Gen using three different metrics (CIDEr, ROUGE-L, and BLEU-1), and achieve the scores of 89.7, 50.6 and 45.6, respectively. Based on experiments, our proposed method shows significant performance improvement for the task of fashion-image captioning, and outperforms other state-of-the-art image captioning methods.</span></p></div></div></div>

Keywords:

Closed captioning Computer science Feature (linguistics) Image (mathematics) Artificial intelligence Sentence Task (project management) Encoder Class (philosophy) Channel (broadcasting) Focus (optics) Pattern recognition (psychology) Speech recognition Natural language processing Linguistics Telecommunications

Metrics

Cited By

0.18

FWCI (Field Weighted Citation Impact)

Refs

0.39

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Visual Attention and Saliency Detection

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Image Retrieval and Classification Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Channel and spatial attention mechanism for fashion image captioning

Abstract

Metrics

Citation History

Topics

Related Documents

Attention Mechanism for Fashion Image Captioning

Spatial- Temporal Attention for Image Captioning

CM-SC: Cross-modal spatial-channel attention network for image captioning

CSA Mamba: A Channel-Spatial Attention Mamba Network for Image Captioning

DSCJA-Captioner: Dual-Branch Spatial and Channel Joint Attention for Image Captioning