Knowledge-Driven Image Captioning

Anwar ul Haque; Sayeed Ghani; Muhammad Arif Saeed

doi:10.1109/access.2025.3642465

ScienceGate Book Chapters

JOURNAL ARTICLE

Knowledge-Driven Image Captioning

Anwar ul Haque Sayeed Ghani Muhammad Arif Saeed

Year: 2025 Journal: IEEE Access Vol: 13 Pages: 211352-211369 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/access.2025.3642465

Get Full-Text PDF Get Analytical Report

Abstract

The human ability to detect, understand, and contextualize objects in the real world has long been a dream for computer scientists, who have sought to replicate this capability in machines. Image captioning with context and content is a significant research problem. In this work, we attempted to develop a storytelling system that can caption images, considering content, context, syntax, and knowledge. Our methodology combines Capsule Networks for image encoding, Knowledge Graphs for content and context awareness, and Transformer Neural Networks for decoding. During feature extraction, spatial, geometrical, and orientational details are extracted using Capsule Networks. The corpus is passed through the Knowledge Graph to equip it with content, context, and semantics. The decoding phase combines the Knowledge Graph and the Transformer Neural Network for knowledge-driven captioning. Our model is trained on MSCOCO, Flickr 8k, and Flickr 32k, and tested on MSCOCO, Flickr 8k, Flickr 32k, and Google Images. The results provide good content and context understanding with B4: 49.97, M: 39.14, C: 136.53, and R: 74.61. The usage of adverbs and adjectives within the generated sentence, according to the objects’ geometrical and semantic relationship, is phenomenal. The primary outcome of our research is the generation of autonomous story-type captions for real-world images.

Keywords:

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Knowledge-Driven Image Captioning

Abstract

Metrics

Topics

Related Documents

Data-driven Image Captioning

Domain knowledge-driven image captioning for bridge damage description generation

Image Captioning with External Knowledge

Image Captioning with Relational Knowledge

Image Captioning with Memorized Knowledge