Bilingual news article alignment methods based on multi-lingual information retrieval have been shown to be successful for the automatic production of so-called noisy-parallel corpora. In this paper we compare the use of machine translation (MT) to the commonly used dictionary term lookup (DTL) method for Reuter news article alignment in English and Japanese. The results show the trade-off between improved lexical disambiguation provided by machine translation and extended synonym choice provided by dictionary term lookup and indicate that MT is superior to DTL only at medium and low recall levels. At high recall levels DTL has superior precision.
Nigel CollierHideki HirakawaAkira Kumano
Mary McGee WoodElaine PollardHeather HorsfallNatsuko HoldenBrian J. ChandlerJeremy J. Carroll
Jong–Hoon OhJuho LeeKyung‐Soon LeeKey‐Sun Choi