Graph Self-Attention Network for Image Captioning

Qitong Zheng; Yu‐Ping Wang

doi:10.1109/aiccsa50499.2020.9316518

ScienceGate Book Chapters

JOURNAL ARTICLE

Graph Self-Attention Network for Image Captioning

Qitong Zheng Yu‐Ping Wang

Year: 2020 Pages: 1-8

DOI: 10.1109/aiccsa50499.2020.9316518

Get Full-Text PDF Get Analytical Report

Abstract

Most state-of-the-art methods for image captioning highly depend on an attention mechanism on the object regions within the encoder-decoder framework. Generally, existing attention models are based on simple addition or multiplication operations and may not fully discover the complex relationships between the visual features and the target words. In this paper, we propose a novel attention model, named graph self-attention (GSA), that incorporates graph networks and self-attention for image captioning. GSA constructs a star-graph model to dynamically assign weights to the detected object regions when generating the words step-by-step. The central node is represented by the semantic feature and the visual features of the object regions are used as edge nodes. Through propagating messages among the center and edge nodes, GSA explicitly captures the relationships between the current target word and the image features. To generate conjunctions and attributives that are not directly related to visual information, GSA introduces self-attention so that such words are allowed to focus more on the semantic information. Moreover, the GSA model is also generic and can be applied to tasks that require attention to multiple features. The experiments show the effectiveness and potentiality of our proposed GSA.

Keywords:

Closed captioning Computer science Focus (optics) Graph Artificial intelligence Encoder Visualization Attention network Word (group theory) Feature (linguistics) Enhanced Data Rates for GSM Evolution Object (grammar) Image (mathematics) Pattern recognition (psychology) Theoretical computer science Natural language processing Mathematics

Metrics

Cited By

0.10

FWCI (Field Weighted Citation Impact)

Refs

0.44

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Domain Adaptation and Few-Shot Learning

Physical Sciences → Computer Science → Artificial Intelligence

Graph Self-Attention Network for Image Captioning

Abstract

Metrics

Citation History

Topics

Related Documents

Dual-stream Self-attention Network for Image Captioning

A Dual Self-Attention based Network for Image Captioning

Multi-Branch Distance-Sensitive Self-Attention Network for Image Captioning

Normalized and Geometry-Aware Self-Attention Network for Image Captioning

Hybrid attention network for image captioning