JOURNAL ARTICLE

DSCJA-Captioner: Dual-Branch Spatial and Channel Joint Attention for Image Captioning

Abstract

Transformer-based image captioning models have been widely used in recent years, but most existing attentions are designed to capture spatial dependencies. These are still inadequate for image captioning. For example, the performance of image captioning also heavily depends on the categories and attributes of the objects. Meanwhile, in the decoding process, when fusing text and vision information, simple splicing is used without fully fusing text and visual information, and the vision information is not fully utilized, which affects the representation capability of the model. Therefore, in order to remedy the above limitations, we propose a Dual-branch Spatial and Channel Joint Attention for image captioning task, which captures both spatial and channel information to improve the representation capability of the model. Further, it also uses a Cross Pre-Fusion module in the decoder to explore the deep relationship between text and vision information, to improve the quality of the sentences. The entire model is abbreviated as DSCJA-captioner. Finally, we have done extensive experiments on the MS COCO dataset to validate the effectiveness of our method. Compared with the state-of-the-art models, our model is competitive.

Keywords:
Closed captioning Computer science Joint (building) Dual (grammatical number) Channel (broadcasting) Artificial intelligence Image (mathematics) Computer vision Telecommunications Engineering Linguistics

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
36
Refs
0.23
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Visual Attention and Saliency Detection
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

CA-Captioner: A novel concentrated attention for image captioning

Xiaobao YangYang YangJunsheng WuWei SunSugang MaZhiqiang Hou

Journal:   Expert Systems with Applications Year: 2024 Vol: 250 Pages: 123847-123847
JOURNAL ARTICLE

Channel and spatial attention mechanism for fashion image captioning

Bao T. NguyenSon T. NguyenAnh H. Vo

Journal:   International Journal of Power Electronics and Drive Systems/International Journal of Electrical and Computer Engineering Year: 2023 Vol: 13 (5)Pages: 5833-5833
BOOK-CHAPTER

The CAA Captioner–Enhancing Image Captioning with Contrastive Learning and Attention on Attention Mechanism

Zhao Cui

Smart innovation, systems and technologies Year: 2024 Pages: 279-295
BOOK-CHAPTER

Dual Branch Non-Autoregressive Image Captioning

Yuanqiu LiuHong YuHui LiXin HanHan Liu

Lecture notes in computer science Year: 2024 Pages: 325-340
© 2026 ScienceGate Book Chapters — All rights reserved.