Cross lingual Information Retrieval (CLIR) refers to the information retrieval activities in which the query and/or documents may appear in different languages. Dictionary-based query translation has been a common method in CLIR systems. In these methods we face with the problem of translation ambiguity in which a single word in one language has more than one translation in the other language. In this paper we propose a hybrid approach to retrieve English documents relevant to Persian queries. In this approach we exploit a combination of phrase reorganization, pattern based phrase translation and query expansion before and after translation to improve the dictionary-based query translation. We also propose an improved probabilistic algorithm to choose the best translation of words and phrases. Finally, the documents will be ranked according to statistical language model with some translation steps. Our experimental results show that each of the mentioned methods can bring significant improvement over simple dictionary approaches.
Christian FluhrDominique SchmitPhilippe OrtetFaza ElkatebKarine GurtnerKhaled Radwan
T. Pattabhi R. K. RaoSobha Lalitha Devi