The good performance of Neural Machine Trans-lation (NMT) normally relies on a large amount of parallel data, while the bilingual data between languages are usually insufficient. mBART improves the performance of low-resource translation by pre-training on multilingual monolingual data and then fine-tuning on bilingual data, but does not leverage parallel data which contains crucial alignment information between languages. In this paper, we propose to use English-centric parallel data in a Multilingual NMT (MNMT) manner with English as the pivot, to provide translation and alignment information for the translation between Chinese and other languages. We conduct experiments on the CCMT 2023 low-resource machine translation task between Chinese and the languages among "the Belt and Road". Our method improves the zh $\rightarrow \text{vi}$ , vi $\rightarrow \text{zh}$ , zh $\rightarrow \text{mn}, \text{mn} \rightarrow \text{zh}$ , zh $\rightarrow \text{cs}$ and cs $\rightarrow \text{zh}$ tasks by $+1.65, +0.24,+0.91, +3.47, +2.88, +6.35$ BLEU respectively over the strong mBART baseline, showing the effectiveness of our approach and the importance of English-centric parallel data.
Bin LiYixuan WengFei XiaHanjun Deng
Jinyi ZhangKe SuHaowei LiJiannan MaoYe TianFeng WenChong GuoTadahiro Matsumoto
Seungjun LeeHyeonseok MoonChanjun ParkHeuiseok Lim
Tal Ben‐NunTiziano De MatteisOliver RauschCarl JohnsenSaurabh RajeAndreas KusterPhilipp SchaadM. BurgerNeville WaloLuca LavariniStefan ScholbeDominic HoferLukas TrümperАндрей ИвановGabriel GavrilasThomas BaumannBerke AtesBenjamin SimmondsNoah HuetterJan KleineMarc WidmerTimo SchneiderTom HuFlorian DeconinckFelix ThalerJohann DahmMamy RatsimbazafySimon JacobBackes ThierryTill EhrengruberValentin Anklin