Abstract

The automatic generation of image descriptions is leading the field of computer vision and natural language processing-based research. Image captioning is a key task that calls for a semantic understanding of the images and the capacity to create descriptions with right structure. Image captioning is a complex problem as it often demands accessing data that might not be visible in each scene. It will require logical thinking to evaluate or have in-depth knowledge about the object present in an image. In this study, we developed a multilayer Convolutional Neural Network to produce words that describe the images, and we used Long Short-Term Memory to accurately construct relevant sentences out of the words that are produced. To generate an accurate description, the Convolutional Neural Network (CNN) model first compares the targeted image against a huge dataset of training samples. In this study, we have used the Flickr 8k dataset. We have used the Bilingual Evaluation Understudy (BLEU) metric to determine how well our model is generating captions for the images. It evaluates the generated text that has been translated from one language to a different language to evaluate the effectiveness of the machine translation system. In this study, we have also used two pre-trained models (VGG16, and XceptionV3) for comparative study.

Keywords:
Closed captioning Computer science Artificial intelligence Convolutional neural network Natural language processing Machine translation Task (project management) Deep learning Construct (python library) Natural language Metric (unit) Task analysis Key (lock) Image (mathematics) Field (mathematics) Language model

Metrics

3
Cited By
0.55
FWCI (Field Weighted Citation Impact)
16
Refs
0.62
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Image Captioning Using Deep Learning and NLP Techniques

Junaid Ahmad WaniSahilpreet Singh

Journal:   International Journal for Research in Applied Science and Engineering Technology Year: 2022 Vol: 10 (10)Pages: 1376-1387
BOOK-CHAPTER

Automatic Image Captioning Using Ensemble of Deep Learning Techniques

Rupendra Kumar KaushikSushil Kumar Sharma

Communications in computer and information science Year: 2022 Pages: 410-418
JOURNAL ARTICLE

Image Captioning Using Deep Learning Techniques Like Cnn-Lstm

RanjanaB Battur

Journal:   International Journal of Environmental Sciences Year: 2024 Pages: 21-30
© 2026 ScienceGate Book Chapters — All rights reserved.