JOURNAL ARTICLE

A Clause-Based Data Augmentation Method for Low-Resource Neural Machine Translation

Fuxue LiMingzhi ShaoHong YanChuncheng Chi

Year: 2025 Journal:   IEEE Access Vol: 13 Pages: 62567-62576   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Transformer-based neural machine translation (NMT) systems have achieved remarkable success with high-resource bilingual corpora. However, their performance deteriorates significantly in low-resource environments due to the scarcity of training data. To mitigate this issue, this paper proposes a novel clause-based data augmentation (DA) approach for NMT, aimed at expanding the training set by leveraging valuable information from the original data. The proposed method commences with the development of a clause extraction algorithm to extract clauses from the target sentences. Subsequently, a target-to-source language NMT model is utilized to generate translations for these clauses. To further enrich the training set, two DA strategies are employed. The efficacy of the proposed approach is validated through experiments conducted on four open translation tasks with limited resources. Experimental results demonstrate that our method consistently outperforms the baseline model and several other DA approaches, highlighting its potential to improve the translation quality in low-resource scenarios.

Keywords:
Computer science Machine translation Artificial intelligence Natural language processing Translation (biology) Speech recognition

Metrics

3
Cited By
14.46
FWCI (Field Weighted Citation Impact)
38
Refs
0.98
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.