Abstract

Image captioning stands as a pivotal technique for providing contextual descriptions of visual content, promising substantial enhancement in the capabilities of conversational AI systems. This work delves into the integration of image captioning methodologies into ChatGPT, aiming to fortify its capacity in understanding and responding to visual information. The study extensively explores the application of deep learning models, encompassing ResNet50, LSTM, DenseNet121, MobileNet, and MobileNetv2, in the domain of image captioning. Specifically, a comprehensive investigation is conducted into a Recurrent Neural Network employing LSTM as a decoder and a Convolutional Neural Network utilizing ResNet as an encoder. These fusion harnesses vocabulary and image features to craft precise and meaningful descriptions of visual content. Furthermore, this study pioneers an approach to identify and relate at least two salient features within any given image, forming a coherent caption that binds the relationship between these identified features. This novel capability not only refines image captioning techniques but also empowers ChatGPT to comprehend complex visual contexts within conversational settings. The outcomes of this work offer profound insights into augmenting AI capabilities, facilitating a deeper understanding and more effective interaction with visual information across various domains, thereby advancing the field of conversational AI integration with visual context.

Keywords:
Computer science Image (mathematics) Artificial intelligence Computer vision Computer graphics (images)

Metrics

3
Cited By
1.59
FWCI (Field Weighted Citation Impact)
10
Refs
0.74
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Artificial Intelligence in Healthcare and Education
Health Sciences →  Medicine →  Health Informatics
COVID-19 diagnosis using AI
Health Sciences →  Medicine →  Radiology, Nuclear Medicine and Imaging

Related Documents

JOURNAL ARTICLE

Image caption generation

Kusa AnjanaM. Swetha

Journal:   Zenodo (CERN European Organization for Nuclear Research) Year: 2025
JOURNAL ARTICLE

Image caption generation

Kusa AnjanaM. Swetha

Journal:   Zenodo (CERN European Organization for Nuclear Research) Year: 2025
JOURNAL ARTICLE

IMAGE GENERATION FROM CAPTION

Mahima PandyaProf. Sonal Rami

Journal:   Zenodo (CERN European Organization for Nuclear Research) Year: 2018
© 2026 ScienceGate Book Chapters — All rights reserved.