Zhiqiang YuZhengtao YuYantuan XianYuxin HuangJunjun Guo
We present a simple, efficient data augmentation approach for boosting Chinese-Vietnamese neural machine translation performance by leveraging the linguistic difference between the two languages. We first define the formalized representation of modifier symmetry, which is one of the most representative linguistic differences between Chinese and Vietnamese. We then propose and test two data augmentation strategies for leveraging the linguistic difference, which can be integrated naturally with different translation models. Results indicate that both strategies can introduce linguistic rules to boost translation accuracy. Tests on Chinese-Vietnamese benchmarks show significant accuracy improvements. To facilitate studies in this domain, we also release an open-source toolkit 1 with flexible implementation for Chinese-Vietnamese linguistic difference tagging.
Zhiqiang YuYantuan XianZhengtao YuYuxin HuangJunjun Guo
Hongfei ZhangZhiqiang YuTing WangZ.J JiangYi Tang
Lin WangZhaoxuan LiHongyan ZhangWuying Liu
Zhiqiang YuTing WangShihu LiuXuewen Tan