Relational Graph Reasoning Transformer for Image Captioning

Xinyu Xiao; Zixun Sun; Tingtian Li; Yipeng Yu

doi:10.1109/icme52920.2022.9859926

ScienceGate Book Chapters

JOURNAL ARTICLE

Relational Graph Reasoning Transformer for Image Captioning

Xinyu Xiao Zixun Sun Tingtian Li Yipeng Yu

Year: 2022 Journal: 2022 IEEE International Conference on Multimedia and Expo (ICME)

DOI: 10.1109/icme52920.2022.9859926

Get Full-Text PDF Get Analytical Report

Abstract

The current published methods of image captioning are directly inputting the features of objects in image into model, and introduced a variety of attention mechanisms to capture the associations between the objects and specific words. But the relationships of vision and semantic between objects are not sufficiently concerned. In this paper, we propose a relational graph reasoning Transformer which explicitly incorporates the relationships of vision and semantic between objects to construct an object relational graph in Transformer. Specifically, besides the detected object features, the global spatial relationships and the semantic context between different objects is attended. Meanwhile, a graph structures feature which correlates object features, their spatial and semantic information is reasoned by a learned grafting mechanism. Finally, the contextual graph feature is integrated into the proposed Transformer decoder. Experimental results demonstrate the significance of our relationship reasoning Transformer model.

Keywords:

Computer science Transformer Closed captioning Artificial intelligence Natural language processing Semantic feature Graph Scene graph Theoretical computer science Image (mathematics)

Metrics

Cited By

0.21

FWCI (Field Weighted Citation Impact)

Refs

0.51

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Human Pose and Action Recognition

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Relational Graph Reasoning Transformer for Image Captioning

Abstract

Metrics

Citation History

Topics

Related Documents

Relational-Convergent Transformer for image captioning

ReFormer: The Relational Transformer for Image Captioning

Image captioning with transformer and knowledge graph

Multi-Modal Graph Aggregation Transformer for image captioning

Graph Alignment Transformer for More Grounded Image Captioning