JOURNAL ARTICLE

Indonesian Language Image Captioning Using Encoder-Decoder With Attention Approach

Abstract

Image captioning is the process of converting an image into a readable description in natural language that captures that content observed within the image. The ability to generate captions for images is significant in helping humans understand visual materials. This can include describing items sold in stores, assisting human-computer interactions, and generally improving assistive technology for visually impaired individuals. The goal of our research is to generate Indonesian captions for images and evaluate the effectiveness of the generated captions. A translated version of the Flickr8k dataset will be used for this study. An encoder-decoder with attention based approach was used, along with the aid of the pre-trained InceptionV3 model for image encoding. Our results have shown that the proposed model outperformed a previous research and obtained BLEU-1, BLEU-2, BLEU-3, and BLEU-4 score of 38.7, 21.1, 8.7, and 3.2 respectively.

Keywords:
Closed captioning Computer science Indonesian Encoder Speech recognition Decoding methods Image (mathematics) Artificial intelligence Natural language processing Linguistics Telecommunications

Metrics

2
Cited By
1.06
FWCI (Field Weighted Citation Impact)
24
Refs
0.65
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Video Analysis and Summarization
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.