JOURNAL ARTICLE

Image Captioning using Hybrid of VGG16 and Bidirectional LSTM Model

Yufis AzharM. Randy AnugerahMuhammad Al Reza FahlopyAlfin Yusriansyah

Year: 2022 Journal:   Kinetik Game Technology Information System Computer Network Computing Electronics and Control   Publisher: Muhammadiyah University of Malang

Abstract

Image captioning is one of the biggest challenges in the fields of computer vision and natural language processing. Many other studies have raised the topic of image captioning. However, the evaluation results from other studies are still low. Thus, this study focuses on improving the evaluation results from previous studies. In this study, we used the Flickr8k dataset and the VGG16 Convolutional Neural Networks (CNN) model as an encoder to generate feature extraction from images. Recurrent Neural Network (RNN) uses the Bidirectional Long-Short Term Memory (BiLSTM) method as a decoder. The results of the image feature extraction process in the form of feature vectors are then forwarded to Bidirectional LSTM to produce descriptions that match the input image or visual content. The captions provide information on the object’s name, location, color, size, features of an object, and surroundings. A greedy Search algorithm with Argmax function and Beam-Search algorithm are used to calculate Bilingual Evaluation Understudy (BLEU) scores. The results of the evaluation of the best BLEU scores obtained from this study are the VGG16 model with Bidirectional LSTM using Beam Search with parameter K = 3 and the BLEU-1 score is 0.60593, so this score is superior to previous studies.

Keywords:
Closed captioning Computer science Artificial intelligence Convolutional neural network Feature (linguistics) Feature extraction Encoder Pattern recognition (psychology) Object (grammar) Image (mathematics) Recurrent neural network Speech recognition Computer vision Artificial neural network

Metrics

5
Cited By
0.62
FWCI (Field Weighted Citation Impact)
25
Refs
0.64
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.