DSCJA-Captioner: Dual-Branch Spatial and Channel Joint Attention for Image Captioning

Xi Tian; Xiaobao Yang; Sugang Ma; Bohui Song; Ziqing He

doi:10.1109/iske60036.2023.10481319

ScienceGate Book Chapters

JOURNAL ARTICLE

DSCJA-Captioner: Dual-Branch Spatial and Channel Joint Attention for Image Captioning

Xi Tian Xiaobao Yang Sugang Ma Bohui Song Ziqing He

Year: 2023 Vol: 65 Pages: 458-464

DOI: 10.1109/iske60036.2023.10481319

Get Full-Text PDF Get Analytical Report

Abstract

Transformer-based image captioning models have been widely used in recent years, but most existing attentions are designed to capture spatial dependencies. These are still inadequate for image captioning. For example, the performance of image captioning also heavily depends on the categories and attributes of the objects. Meanwhile, in the decoding process, when fusing text and vision information, simple splicing is used without fully fusing text and visual information, and the vision information is not fully utilized, which affects the representation capability of the model. Therefore, in order to remedy the above limitations, we propose a Dual-branch Spatial and Channel Joint Attention for image captioning task, which captures both spatial and channel information to improve the representation capability of the model. Further, it also uses a Cross Pre-Fusion module in the decoder to explore the deep relationship between text and vision information, to improve the quality of the sentences. The entire model is abbreviated as DSCJA-captioner. Finally, we have done extensive experiments on the MS COCO dataset to validate the effectiveness of our method. Compared with the state-of-the-art models, our model is competitive.

Keywords:

Closed captioning Computer science Joint (building) Dual (grammatical number) Channel (broadcasting) Artificial intelligence Image (mathematics) Computer vision Telecommunications Engineering Linguistics

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.23

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Visual Attention and Saliency Detection

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

DSCJA-Captioner: Dual-Branch Spatial and Channel Joint Attention for Image Captioning

Abstract

Metrics

Topics

Related Documents

CA-Captioner: A novel concentrated attention for image captioning

Image Captioner: Captioning for Visual Impact

Channel and spatial attention mechanism for fashion image captioning

The CAA Captioner–Enhancing Image Captioning with Contrastive Learning and Attention on Attention Mechanism

Dual Branch Non-Autoregressive Image Captioning