JOURNAL ARTICLE

Analysis of Subword Tokenization for Transformer Model in Neural Machine Translation between Myanmar and English Languages

Abstract

Machine translation between Myanmar and English, and vice versa, presents significant challenges but stands as an important area of research for fostering connectivity and facilitating information access for Myanmar language speakers. Sustained research efforts and continual innovation are imperative for elevating the quality and accessibility of machine translation in these language pairs. Neural Machine Translation (NMT) models, especially those utilizing attention mechanisms and the transformer model, show promise in the field of machine translation. The integration of subword approaches in machine translation is crucial for managing the complexity and diversity of languages. It improves adaptability and enhances the overall performance of translation models. This is particularly important in scenarios involving morphological variations and limited resources for languages. In this study, our aim is to present the results of evaluating the translation performance of Transformer and Recurrent Neural Network (RNN) models optimized with subwording on the Myanmar-English WAT2019 corpus. Subsequently, we will conduct an evaluation and comparison of these models. Importantly, we highlight that the correct selection of the subword model emerges as the most significant factor influencing translation performance. An optimized Transformer model using subwording with a 32k Byte Pair Encoding (BPE) demonstrated a significant improvement in BLEU scores. Specifically, there was an 16.92 points improvement for the English-Myanmar direction and a 17.01 points improvement for the Myanmar-English direction when compared to a baseline RNN model that was also optimized with subwording using 32k BPE. We conducted an assessment of SentencePiece models utilizing both unigram and BPE approaches. The results indicated an improvement in BLEU scores 50.76 points for English-Myanamar direction and 48.91 for Myanmar-English direction, particularly with Transformer models optimized with 32k BPE subword models.

Keywords:
Machine translation Computer science Transformer Artificial intelligence Natural language processing Language model Evaluation of machine translation Recurrent neural network Example-based machine translation Machine learning Artificial neural network Machine translation software usability Voltage Engineering

Metrics

2
Cited By
1.28
FWCI (Field Weighted Citation Impact)
23
Refs
0.76
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.