JOURNAL ARTICLE

Inflating a Small Parallel Corpus into a Large Quasi-parallel Corpus Using Monolingual Data for Chinese-Japanese Machine Translation

Wei YangHanfei ShenYves Lepage

Year: 2017 Journal:   Journal of Information Processing Vol: 25 (0)Pages: 88-99

Abstract

Increasing the size of parallel corpora for less-resourced language pairs is essential for machine translation (MT). To address the shortage of parallel corpora between Chinese and Japanese, we propose a method to construct a quasi-parallel corpus by inflating a small amount of Chinese-Japanese corpus, so as to improve statistical machine translation (SMT) quality. We generate new sentences using analogical associations based on large amounts of monolingual data and a small amount of parallel data. We filter over-generated sentences using two filtering methods: one based on BLEU and the second one based on N-sequences. We add the obtained aligned quasi-parallel corpus to a small parallel Chinese-Japanese corpus and perform SMT experiments. We obtain significant improvements over a baseline system.

Keywords:
Computer science Parallel corpora Machine translation Natural language processing BLEU Artificial intelligence Economic shortage Translation (biology) Construct (python library) Filter (signal processing) Speech recognition Linguistics Programming language

Metrics

2
Cited By
0.46
FWCI (Field Weighted Citation Impact)
42
Refs
0.68
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Algorithms and Data Compression
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.