JOURNAL ARTICLE

Relational Graph Reasoning Transformer for Image Captioning

Xinyu XiaoZixun SunTingtian LiYipeng Yu

Year: 2022 Journal:   2022 IEEE International Conference on Multimedia and Expo (ICME)

Abstract

The current published methods of image captioning are directly inputting the features of objects in image into model, and introduced a variety of attention mechanisms to capture the associations between the objects and specific words. But the relationships of vision and semantic between objects are not sufficiently concerned. In this paper, we propose a relational graph reasoning Transformer which explicitly incorporates the relationships of vision and semantic between objects to construct an object relational graph in Transformer. Specifically, besides the detected object features, the global spatial relationships and the semantic context between different objects is attended. Meanwhile, a graph structures feature which correlates object features, their spatial and semantic information is reasoned by a learned grafting mechanism. Finally, the contextual graph feature is integrated into the proposed Transformer decoder. Experimental results demonstrate the significance of our relationship reasoning Transformer model.

Keywords:
Computer science Transformer Closed captioning Artificial intelligence Natural language processing Semantic feature Graph Scene graph Theoretical computer science Image (mathematics)

Metrics

3
Cited By
0.21
FWCI (Field Weighted Citation Impact)
33
Refs
0.51
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Relational-Convergent Transformer for image captioning

Lizhi ChenYou-Fu YangJuntao HuLongyue PanHao Zhai

Journal:   Displays Year: 2023 Vol: 77 Pages: 102377-102377
JOURNAL ARTICLE

ReFormer: The Relational Transformer for Image Captioning

Xuewen YangYingru LiuXin Wang

Journal:   Proceedings of the 30th ACM International Conference on Multimedia Year: 2022 Pages: 5398-5406
JOURNAL ARTICLE

Image captioning with transformer and knowledge graph

Yu ZhangXinyu ShiSiya MiXu Yang

Journal:   Pattern Recognition Letters Year: 2021 Vol: 143 Pages: 43-49
JOURNAL ARTICLE

Multi-Modal Graph Aggregation Transformer for image captioning

Lizhi ChenKesen Li

Journal:   Neural Networks Year: 2024 Vol: 181 Pages: 106813-106813
© 2026 ScienceGate Book Chapters — All rights reserved.