JOURNAL ARTICLE

Image Captioning using CNN and LSTM

Anish Banda

Year: 2021 Journal:   International Journal for Research in Applied Science and Engineering Technology Vol: 9 (8)Pages: 2666-2669   Publisher: International Journal for Research in Applied Science and Engineering Technology (IJRASET)

Abstract

Abstract: In the model we proposed, we examine the deep neural networks-based image caption generation technique. We give image as input to the model, the technique give output in three different forms i.e., sentence in three different languages describing the image, mp3 audio file and an image file is also generated. In this model, we use the techniques of both computer vision and natural language processing. We are aiming to develop a model using the techniques of Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) to build a model to generate a Caption. Target image is compared with the training images, we have a large dataset containing the training images, this is done by convolutional neural network. This model generates a decent description utilizing the trained data. To extract features from images we need encoder, we use CNN as encoder. To decode the description of image generated we use LSTM. To evaluate the accuracy of generated caption we use BLEU metric algorithm. It grades the quality of content generated. Performance is calculated by the standard calculation matrices. Keywords: CNN, RNN, LSTM, BLEU score, encoder, decoder, captions, image description.

Keywords:
Computer science Closed captioning Artificial intelligence Encoder Convolutional neural network Image (mathematics) Sentence Metric (unit) Speech recognition Recurrent neural network Artificial neural network Pattern recognition (psychology) Computer vision

Metrics

2
Cited By
0.10
FWCI (Field Weighted Citation Impact)
13
Refs
0.40
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

BOOK-CHAPTER

Image Captioning Using CNN-LSTM

Akshay JoshiKartik R KalalDhiraj BhandareVaishnavi PatilUday KulkarniS. M. Meena

Lecture notes in electrical engineering Year: 2024 Pages: 421-433
JOURNAL ARTICLE

Fast image captioning using LSTM

Meng HanWenyu ChenAlemu Dagmawi Moges

Journal:   Cluster Computing Year: 2018 Vol: 22 (S3)Pages: 6143-6155
JOURNAL ARTICLE

Image Captioning using CNN and LSTM

G. SairamM. MandhaP. PrashanthP. Swetha

Journal:   IET conference proceedings. Year: 2022 Vol: 2021 (11)Pages: 274-277
JOURNAL ARTICLE

Image captioning using bidirectional LSTM neural network

Farnaz HoseiniAnaram Yaghoobi Notash

Journal:   Discover Artificial Intelligence Year: 2025 Vol: 5 (1)
© 2026 ScienceGate Book Chapters — All rights reserved.