JOURNAL ARTICLE

Statistical query translation models for cross-language information retrieval

Jianfeng GaoJian‐Yun NieMing Zhou

Year: 2006 Journal:   ACM Transactions on Asian Language Information Processing Vol: 5 (4)Pages: 323-359   Publisher: Association for Computing Machinery

Abstract

Query translation is an important task in cross-language information retrieval (CLIR), which aims to determine the best translation words and weights for a query. This article presents three statistical query translation models that focus on the resolution of query translation ambiguities. All the models assume that the selection of the translation of a query term depends on the translations of other terms in the query. They differ in the way linguistic structures are detected and exploited. The co-occurrence model treats a query as a bag of words and uses all the other terms in the query as the context for translation disambiguation. The other two models exploit linguistic dependencies among terms. The noun phrase (NP) translation model detects NPs in a query, and translates each NP as a unit by assuming that the translation of a term only depends on other terms within the same NP. Similarly, the dependency translation model detects and translates dependency triples, such as verb-object, as units. The evaluations show that linguistic structures always lead to more precise translations. The experiments of CLIR on TREC Chinese collections show that all three models have a positive impact on query translation and lead to significant improvements of CLIR performance over the simple dictionary-based translation method. The best results are obtained by combining the three models.

Keywords:
Computer science Cross-language information retrieval Natural language processing Query expansion Artificial intelligence Machine translation Query language RDF query language Translation (biology) Dependency (UML) Query optimization Context (archaeology) Information retrieval Web query classification Web search query Search engine

Metrics

29
Cited By
3.54
FWCI (Field Weighted Citation Impact)
68
Refs
0.93
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Algorithms and Data Compression
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.