JOURNAL ARTICLE

Low-Resource Neural Machine Translation Improvement Using Data Augmentation Strategies

Thai Nguyen QuocLê Thanh HươngHanh Pham Van

Year: 2023 Journal:   Informatica Vol: 47 (3)   Publisher: Slovenian Society Informatika

Abstract

The development of neural models has greatly improved the performance of machine translation, but these methods require large-scale parallel data, which can be difficult to obtain for low-resource language pairs. To address this issue, this research employs a pre-trained multilingual model and fine-tunes it by using a small bilingual dataset. Additionally, two data-augmentation strategies are proposed to generate new training data: (i) back-translation with the dataset from the source language; (ii) data augmentation via the English pivot language. The proposed approach is applied to the Khmer-Vietnamese machine translation. Experimental results show that our proposed approach outperforms the Google Translator model by 5.3% in terms of BLEU score on a test set of 2,000 Khmer-Vietnamese sentence pairs.

Keywords:
Machine translation Computer science Vietnamese Artificial intelligence Sentence Natural language processing Translation (biology) Set (abstract data type) Training set Data set Test set Test data Resource (disambiguation) Machine learning Linguistics Programming language

Metrics

6
Cited By
1.53
FWCI (Field Weighted Citation Impact)
40
Refs
0.82
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.