RGTranCNet: EFFECTIVE IMAGE CAPTIONING MODEL USING CROSS-ATTENTION AND SEMANTIC KNOWLEDGE

Nguyễn Văn Thịnh; Lang Tran; Thanh The Van

doi:10.15625/2525-2518/22381

ScienceGate Book Chapters

JOURNAL ARTICLE

RGTranCNet: EFFECTIVE IMAGE CAPTIONING MODEL USING CROSS-ATTENTION AND SEMANTIC KNOWLEDGE

Nguyễn Văn Thịnh Lang Tran Thanh The Van

Year: 2025 Journal: Vietnam Journal of Science and Technology/Science and Technology Publisher: Vietnam Academy of Science and Technology

DOI: 10.15625/2525-2518/22381

Get Full-Text PDF Get Analytical Report

Abstract

Image captioning is an important task that bridges computer vision and natural language processing. However, methods based on long short-term memory (LSTM) and traditional attention mechanisms are limited in handling complex relationships and parallelization capabilities. Moreover, accurately describing objects that have yet to appear in the training set poses a significant challenge. This study proposes a novel image captioning model, utilizing Transformer with cross-attention mechanisms combined with semantic knowledge from ConceptNet to address these issues. The model adopts an encoder-decoder framework, where the encoder extracts object region features and constructs a relational graph to represent the image, while the decoder integrates visual and semantic features through cross-attention to generate precise and diverse captions. Integrating ConceptNet knowledge enhances accuracy, particularly for objects not present in the training set. Experimental results on the MS COCO, a benchmark dataset, demonstrate that the model outperforms recent state-of-the-art approaches. Furthermore, this study's semantic knowledge integration method can be easily applied to other image captioning models.

Keywords:

Closed captioning Computer science Image (mathematics) Natural language processing Artificial intelligence

Metrics

Cited By

6.90

FWCI (Field Weighted Citation Impact)

Refs

0.89

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Brain Tumor Detection and Classification

Life Sciences → Neuroscience → Neurology

Advanced Neural Network Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

AI in cancer detection

Physical Sciences → Computer Science → Artificial Intelligence

RGTranCNet: EFFECTIVE IMAGE CAPTIONING MODEL USING CROSS-ATTENTION AND SEMANTIC KNOWLEDGE

Abstract

Metrics

Citation History

Topics

Related Documents

Image Captioning with Semantic Attention

Image Captioning using Domain Ontology based Semantic Attention Approaches

GateCap: Gated spatial and semantic attention model for image captioning

Spatial-Semantic Attention for Grounded Image Captioning

Image Captioning With Visual-Semantic Double Attention