JOURNAL ARTICLE

Exploiting Representations from Statistical Machine Translation for Cross-Language Information Retrieval

Ferhan TüreJimmy Lin

Year: 2014 Journal:   ACM Transactions on Information Systems Vol: 32 (4)Pages: 1-32

Abstract

This work explores how internal representations of modern statistical machine translation systems can be exploited for cross-language information retrieval. We tackle two core issues that are central to query translation: how to exploit context to generate more accurate translations and how to preserve ambiguity that may be present in the original query, thereby retaining a diverse set of translation alternatives. These two considerations are often in tension since ambiguity in natural language is typically resolved by exploiting context, but effective retrieval requires striking the right balance. We propose two novel query translation approaches: the grammar-based approach extracts translation probabilities from translation grammars, while the decoder-based approach takes advantage of n -best translation hypotheses. Both are context-sensitive , in contrast to a baseline context-insensitive approach that uses bilingual dictionaries for word-by-word translation. Experimental results show that by “opening up” modern statistical machine translation systems, we can access intermediate representations that yield high retrieval effectiveness. By combining evidence from multiple sources, we demonstrate significant improvements over competitive baselines on standard cross-language information retrieval test collections. In addition to effectiveness, the efficiency of our techniques are explored as well.

Keywords:
Computer science Machine translation Natural language processing Artificial intelligence Rule-based machine translation Synchronous context-free grammar Example-based machine translation Transfer-based machine translation Machine translation software usability Exploit Ambiguity Context (archaeology) Cross-language information retrieval Translation (biology) Information retrieval Programming language

Metrics

8
Cited By
1.93
FWCI (Field Weighted Citation Impact)
81
Refs
0.88
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Algorithms and Data Compression
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.