Image captioning is the process of converting an image into a readable description in natural language that captures that content observed within the image. The ability to generate captions for images is significant in helping humans understand visual materials. This can include describing items sold in stores, assisting human-computer interactions, and generally improving assistive technology for visually impaired individuals. The goal of our research is to generate Indonesian captions for images and evaluate the effectiveness of the generated captions. A translated version of the Flickr8k dataset will be used for this study. An encoder-decoder with attention based approach was used, along with the aid of the pre-trained InceptionV3 model for image encoding. Our results have shown that the proposed model outperformed a previous research and obtained BLEU-1, BLEU-2, BLEU-3, and BLEU-4 score of 38.7, 21.1, 8.7, and 3.2 respectively.
S. ThivaharanSrivatsun GopalakrishnanPranav Kiran SJohan Benoni Raul J
Santosh Kumar MishraGaurav RaiSriparna SahaPushpak Bhattacharyya
KareemunissaPrakash Ramachandran