Improving Chinese-Centric Low-Resource Translation Using English-Centric Pivoted Parallel Data

xianghui wang; Lingling Mu; Hongfei Xu

doi:10.1109/ialp61005.2023.10337297

ScienceGate Book Chapters

JOURNAL ARTICLE

Improving Chinese-Centric Low-Resource Translation Using English-Centric Pivoted Parallel Data

xianghui wang Lingling Mu Hongfei Xu

Year: 2023 Pages: 68-73

DOI: 10.1109/ialp61005.2023.10337297

Get Full-Text PDF Get Analytical Report

Abstract

The good performance of Neural Machine Trans-lation (NMT) normally relies on a large amount of parallel data, while the bilingual data between languages are usually insufficient. mBART improves the performance of low-resource translation by pre-training on multilingual monolingual data and then fine-tuning on bilingual data, but does not leverage parallel data which contains crucial alignment information between languages. In this paper, we propose to use English-centric parallel data in a Multilingual NMT (MNMT) manner with English as the pivot, to provide translation and alignment information for the translation between Chinese and other languages. We conduct experiments on the CCMT 2023 low-resource machine translation task between Chinese and the languages among "the Belt and Road". Our method improves the zh $\rightarrow \text{vi}$ , vi $\rightarrow \text{zh}$ , zh $\rightarrow \text{mn}, \text{mn} \rightarrow \text{zh}$ , zh $\rightarrow \text{cs}$ and cs $\rightarrow \text{zh}$ tasks by $+1.65, +0.24,+0.91, +3.47, +2.88, +6.35$ BLEU respectively over the strong mBART baseline, showing the effectiveness of our approach and the importance of English-centric parallel data.

Keywords:

Computer science Translation (biology) Resource (disambiguation) Machine translation Artificial intelligence Computer network

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.17

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Algorithms and Data Compression

Physical Sciences → Computer Science → Artificial Intelligence

Improving Chinese-Centric Low-Resource Translation Using English-Centric Pivoted Parallel Data

Abstract

Metrics

Topics

Related Documents

Towards better Chinese-centric neural machine translation for low-resource languages

Improving neural machine translation: data centric approaches

Neural Machine Translation for Low-Resource Languages from a Chinese-centric Perspective: A Survey

Improving Formality-Sensitive Machine Translation Using Data-Centric Approaches and Prompt Engineering

DaCe - Data Centric Parallel Programming