JOURNAL ARTICLE

Improving query translation for cross-language information retrieval using statistical models

Abstract

Dictionaries have often been used for query translation in crosslanguage information retrieval (CLIR). However, we are faced with the problem of translation ambiguity, i.e. multiple translations are stored in a dictionary for a word. In addition, a word-by-word query translation is not precise enough. In this paper, we explore several methods to improve the previous dictionary-based query translation. First, as many as possible, noun phrases are recognized and translated as a whole by using statistical models and phrase translation patterns. Second, the best word translations are selected based on the cohesion of the translation words. Our experimental results on TREC English-Chinese CLIR collection show that these techniques result in significant improvements over the simple dictionary approaches, and achieve even better performance than a high-quality machine translation system.

Keywords:
Computer science Natural language processing Cross-language information retrieval Machine translation Artificial intelligence Cohesion (chemistry) Word (group theory) Translation (biology) Query expansion Phrase Example-based machine translation Rule-based machine translation Principle of maximum entropy Information retrieval Linguistics

Metrics

127
Cited By
8.34
FWCI (Field Weighted Citation Impact)
22
Refs
0.98
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Semantic Web and Ontologies
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.