JOURNAL ARTICLE

A Diverse Data Augmentation Strategy for Low-Resource Neural Machine Translation

Yu LiXiao LiYating YangRui Dong

Year: 2020 Journal:   Information Vol: 11 (5)Pages: 255-255   Publisher: Multidisciplinary Digital Publishing Institute

Abstract

One important issue that affects the performance of neural machine translation is the scale of available parallel data. For low-resource languages, the amount of parallel data is not sufficient, which results in poor translation quality. In this paper, we propose a diversity data augmentation method that does not use extra monolingual data. We expand the training data by generating diversity pseudo parallel data on the source and target sides. To generate diversity data, the restricted sampling strategy is employed at the decoding steps. Finally, we filter and merge origin data and synthetic parallel corpus to train the final model. In the experiment, the proposed approach achieved 1.96 BLEU points in the IWSLT2014 German–English translation tasks, which was used to simulate a low-resource language. Our approach also consistently and substantially obtained 1.0 to 2.0 BLEU improvement in three other low-resource translation tasks, including English–Turkish, Nepali–English, and Sinhala–English translation tasks.

Keywords:
Machine translation Computer science Natural language processing BLEU Artificial intelligence Merge (version control) Decoding methods Translation (biology) Speech recognition Information retrieval Algorithm

Metrics

27
Cited By
2.20
FWCI (Field Weighted Citation Impact)
59
Refs
0.89
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.