A Dual Self-Attention based Network for Image Captioning

Zhiyong Li; Jinfu Yang; Yaping Li

doi:10.1109/ccdc52312.2021.9602488

ScienceGate Book Chapters

JOURNAL ARTICLE

A Dual Self-Attention based Network for Image Captioning

Zhiyong Li Jinfu Yang Yaping Li

Year: 2021 Pages: 1590-1595

DOI: 10.1109/ccdc52312.2021.9602488

Get Full-Text PDF Get Analytical Report

Abstract

Image captioning technology has become an important solution for intelligent robots to understand image content. How to extract image information effectively is the key to generate accurate and reliable captions. In this paper, we propose a dual self-attention based network (DSAN) for image captioning. Specifically, we design a Dual Self-Attention Module (DSAM) embedded into an encoding-decoding architecture to capture the contextual information in the image, which can adaptively integrate local features with global dependencies. The DSAM can significantly improve the caption results by modeling rich contextual dependencies over local features. Experimental results on the MS COCO dataset show that the proposed DSAN can achieve better performance than existing methods.

Keywords:

Closed captioning Computer science Dual (grammatical number) Image (mathematics) Encoding (memory) Decoding methods Key (lock) Artificial intelligence Architecture Computer vision Algorithm Computer security

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.16

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Domain Adaptation and Few-Shot Learning

Physical Sciences → Computer Science → Artificial Intelligence

A Dual Self-Attention based Network for Image Captioning

Abstract

Metrics

Topics

Related Documents

Dual-stream Self-attention Network for Image Captioning

Advancing image captioning with V16HP1365 encoder and dual self-attention network

Graph Self-Attention Network for Image Captioning

Positional Self-attention Based Hierarchical Image Captioning

Dual Attention for Vietnamese Image Captioning