Wei YangHanfei ShenYves Lepage
Increasing the size of parallel corpora for less-resourced language pairs is essential for machine translation (MT). To address the shortage of parallel corpora between Chinese and Japanese, we propose a method to construct a quasi-parallel corpus by inflating a small amount of Chinese-Japanese corpus, so as to improve statistical machine translation (SMT) quality. We generate new sentences using analogical associations based on large amounts of monolingual data and a small amount of parallel data. We filter over-generated sentences using two filtering methods: one based on BLEU and the second one based on N-sequences. We add the obtained aligned quasi-parallel corpus to a small parallel Chinese-Japanese corpus and perform SMT experiments. We obtain significant improvements over a baseline system.
Sainik Kumar MahataJyoti GuptaKhusboo KumariMonalisa DeyAnupam MondalDarothi Sarkar
Hitoshi ItoNaoto ShiraiKazutaka KinugawaHideya MinoYoshihiko Kawai
Atnafu Lambebo TonjaTadesse Destaw BelayOlga KolesnikovaSeid Muhie YimamAbinew Ali AyeleGrigori SidorovAlexander Gelbukh
Atnafu Lambebo TonjaTadesse Destaw BelayOlga KolesnikovaSeid Muhie YimamAbinew Ali AyeleGrigori SidorovAlexander Gelbukh