JOURNAL ARTICLE

RGTranCNet: EFFECTIVE IMAGE CAPTIONING MODEL USING CROSS-ATTENTION AND SEMANTIC KNOWLEDGE

Nguyễn Văn ThịnhLang TranThanh The Van

Year: 2025 Journal:   Vietnam Journal of Science and Technology/Science and Technology   Publisher: Vietnam Academy of Science and Technology

Abstract

Image captioning is an important task that bridges computer vision and natural language processing. However, methods based on long short-term memory (LSTM) and traditional attention mechanisms are limited in handling complex relationships and parallelization capabilities. Moreover, accurately describing objects that have yet to appear in the training set poses a significant challenge. This study proposes a novel image captioning model, utilizing Transformer with cross-attention mechanisms combined with semantic knowledge from ConceptNet to address these issues. The model adopts an encoder-decoder framework, where the encoder extracts object region features and constructs a relational graph to represent the image, while the decoder integrates visual and semantic features through cross-attention to generate precise and diverse captions. Integrating ConceptNet knowledge enhances accuracy, particularly for objects not present in the training set. Experimental results on the MS COCO, a benchmark dataset, demonstrate that the model outperforms recent state-of-the-art approaches. Furthermore, this study's semantic knowledge integration method can be easily applied to other image captioning models.

Keywords:
Closed captioning Computer science Image (mathematics) Natural language processing Artificial intelligence

Metrics

2
Cited By
6.90
FWCI (Field Weighted Citation Impact)
0
Refs
0.89
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Brain Tumor Detection and Classification
Life Sciences →  Neuroscience →  Neurology
Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
AI in cancer detection
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

GateCap: Gated spatial and semantic attention model for image captioning

Shiwei WangLong LanXiang ZhangZhigang Luo

Journal:   Multimedia Tools and Applications Year: 2020 Vol: 79 (17-18)Pages: 11531-11549
JOURNAL ARTICLE

Spatial-Semantic Attention for Grounded Image Captioning

Wenzhe HuLanxiao WangLinfeng Xu

Journal:   2022 IEEE International Conference on Image Processing (ICIP) Year: 2022 Pages: 61-65
JOURNAL ARTICLE

Image Captioning With Visual-Semantic Double Attention

Chen HeHaifeng Hu

Journal:   ACM Transactions on Multimedia Computing Communications and Applications Year: 2019 Vol: 15 (1)Pages: 1-16
© 2026 ScienceGate Book Chapters — All rights reserved.