Image Captioning using Deep Learning Techniques

Arunkumar Gopu; Pratyush Nishchal; Vishesh Mittal; Kuna Srinidhi

doi:10.1109/inc457730.2023.10263093

ScienceGate Book Chapters

JOURNAL ARTICLE

Image Captioning using Deep Learning Techniques

Arunkumar Gopu Pratyush Nishchal Vishesh Mittal Kuna Srinidhi

Year: 2023 Pages: 1-5

DOI: 10.1109/inc457730.2023.10263093

Get Full-Text PDF Get Analytical Report

Abstract

The automatic generation of image descriptions is leading the field of computer vision and natural language processing-based research. Image captioning is a key task that calls for a semantic understanding of the images and the capacity to create descriptions with right structure. Image captioning is a complex problem as it often demands accessing data that might not be visible in each scene. It will require logical thinking to evaluate or have in-depth knowledge about the object present in an image. In this study, we developed a multilayer Convolutional Neural Network to produce words that describe the images, and we used Long Short-Term Memory to accurately construct relevant sentences out of the words that are produced. To generate an accurate description, the Convolutional Neural Network (CNN) model first compares the targeted image against a huge dataset of training samples. In this study, we have used the Flickr 8k dataset. We have used the Bilingual Evaluation Understudy (BLEU) metric to determine how well our model is generating captions for the images. It evaluates the generated text that has been translated from one language to a different language to evaluate the effectiveness of the machine translation system. In this study, we have also used two pre-trained models (VGG16, and XceptionV3) for comparative study.

Keywords:

Closed captioning Computer science Artificial intelligence Convolutional neural network Natural language processing Machine translation Task (project management) Deep learning Construct (python library) Natural language Metric (unit) Task analysis Key (lock) Image (mathematics) Field (mathematics) Language model

Metrics

Cited By

0.55

FWCI (Field Weighted Citation Impact)

Refs

0.62

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Human Pose and Action Recognition

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Image Captioning using Deep Learning Techniques

Abstract

Metrics

Citation History

Topics

Related Documents

Image Captioning Using Deep Learning and NLP Techniques

Comparing Image Captioning Techniques using Deep Learning Models

Deep learning techniques for image captioning

Automatic Image Captioning Using Ensemble of Deep Learning Techniques

Image Captioning Using Deep Learning Techniques Like Cnn-Lstm