Enriching Conversations: Empowering ChatGPT with Image Caption Generation

Rahul Chowdary K Eagalapati; M Chaitanya Chowdary; Addepalli Sai Sasank; Busam Monish; Pallavi R Kumar

doi:10.1109/i2ct61223.2024.10543845

ScienceGate Book Chapters

JOURNAL ARTICLE

Enriching Conversations: Empowering ChatGPT with Image Caption Generation

Rahul Chowdary K Eagalapati M Chaitanya Chowdary Addepalli Sai Sasank Busam Monish Pallavi R Kumar

Year: 2024 Pages: 1-5

DOI: 10.1109/i2ct61223.2024.10543845

Get Full-Text PDF Get Analytical Report

Abstract

Image captioning stands as a pivotal technique for providing contextual descriptions of visual content, promising substantial enhancement in the capabilities of conversational AI systems. This work delves into the integration of image captioning methodologies into ChatGPT, aiming to fortify its capacity in understanding and responding to visual information. The study extensively explores the application of deep learning models, encompassing ResNet50, LSTM, DenseNet121, MobileNet, and MobileNetv2, in the domain of image captioning. Specifically, a comprehensive investigation is conducted into a Recurrent Neural Network employing LSTM as a decoder and a Convolutional Neural Network utilizing ResNet as an encoder. These fusion harnesses vocabulary and image features to craft precise and meaningful descriptions of visual content. Furthermore, this study pioneers an approach to identify and relate at least two salient features within any given image, forming a coherent caption that binds the relationship between these identified features. This novel capability not only refines image captioning techniques but also empowers ChatGPT to comprehend complex visual contexts within conversational settings. The outcomes of this work offer profound insights into augmenting AI capabilities, facilitating a deeper understanding and more effective interaction with visual information across various domains, thereby advancing the field of conversational AI integration with visual context.

Keywords:

Computer science Image (mathematics) Artificial intelligence Computer vision Computer graphics (images)

Metrics

Cited By

1.59

FWCI (Field Weighted Citation Impact)

Refs

0.74

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Artificial Intelligence in Healthcare and Education

Health Sciences → Medicine → Health Informatics

COVID-19 diagnosis using AI

Health Sciences → Medicine → Radiology, Nuclear Medicine and Imaging

Enriching Conversations: Empowering ChatGPT with Image Caption Generation

Abstract

Metrics

Citation History

Topics

Related Documents

Image caption generation

Image caption generation

Image Caption Generation With Adaptive Transformer

Image Caption Generation with Beam Search

IMAGE GENERATION FROM CAPTION