Although context-independent word-based approaches remain popular for cross-language information retrieval, many recent studies have shown that integrating insights from modern statistical machine translation systems can lead to substantial improvements in effectiveness. In this paper, we compare flat and hierarchical phrase-based translation models for query translation. Both approaches yield significantly better results than either a token-based or a one-best translation baseline on standard test collections. The choice of model manifests interesting tradeoffs in terms of effectiveness, efficiency, and model compactness.
Wessel KraaijJian‐Yun NieMichel Simard
Thanh NguyenHieu V. NguyenTuoi Thi Phan
Jianfeng GaoJian‐Yun NieMing Zhou