ThaiTC:Thai Transformer-based Image Captioning

Teetouch Jaknamon; Sanparith Marukatat

doi:10.1109/isai-nlp56921.2022.9960246

ScienceGate Book Chapters

JOURNAL ARTICLE

ThaiTC:Thai Transformer-based Image Captioning

Teetouch Jaknamon Sanparith Marukatat

Year: 2022 Pages: 1-4

DOI: 10.1109/isai-nlp56921.2022.9960246

Get Full-Text PDF Get Analytical Report

Abstract

For problems with image captioning is a technique that has been used for a long time. In the past, there was a way to use convolutional neural network (CNN) for feature extraction and recurrent neural network (RNN) for generating text, and especially in Thai language, It has to be developed further in the era of the popular use of transformers. This paper proposes an end-to-end image captioning with pretrained vision Transformers (ViT) and text transformers in Thai language models namely ThaiTC, Which leverages the transformer architecture both. We has experiment pretrained vision transformer and text transformer in Thai language that best for Thai image captioning and tested on 3 Thai image captioning datasets 1) Travel 2) Food 3) Flickr 30k(t $r$ anslate) with different challenges. Includes freeze vision transformers weight training for image captioning dataset training with less image features, From the experiment, We found that ThaiTC performed much better in the Food and Flickr30k datasets than the Travel datasets, Which allowed us to automatically create subtitles about food and travel.

Keywords:

Closed captioning Transformer Computer science Convolutional neural network Feature extraction Artificial intelligence Natural language processing Speech recognition Image (mathematics) Engineering Electrical engineering

Metrics

Cited By

0.74

FWCI (Field Weighted Citation Impact)

Refs

0.68

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Neural Network Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

ThaiTC:Thai Transformer-based Image Captioning

Abstract

Metrics

Citation History

Topics

Related Documents

Captioning Images with Words: A Transformer-based Image Captioning Model

Efficient Image Captioning Based on Vision Transformer Models

Towards Arabic Image Captioning: A Transformer-Based Approach

Image Captioning using CNN and Attention Based Transformer

Policy Learning-Based Image Captioning With Vision Transformer