Thai Nguyen QuocLê Thanh HươngHanh Pham Van
The development of neural models has greatly improved the performance of machine translation, but these methods require large-scale parallel data, which can be difficult to obtain for low-resource language pairs. To address this issue, this research employs a pre-trained multilingual model and fine-tunes it by using a small bilingual dataset. Additionally, two data-augmentation strategies are proposed to generate new training data: (i) back-translation with the dataset from the source language; (ii) data augmentation via the English pivot language. The proposed approach is applied to the Khmer-Vietnamese machine translation. Experimental results show that our proposed approach outperforms the Google Translator model by 5.3% in terms of BLEU score on a test set of 2,000 Khmer-Vietnamese sentence pairs.
Marzieh FadaeeArianna BisazzaChristof Monz
Atnafu Lambebo TonjaOlga KolesnikovaAlexander GelbukhGrigori Sidorov
Yu LiXiao LiYating YangRui Dong
Syed Matla Ul QumarM. Nayyar AzimS. M. K. Quadri
Fuxue LiBeibei LiuHong YanMingzhi ShaoPeijun XieJiarui LiChuncheng Chi