JOURNAL ARTICLE

Effective arabic-english cross-language information retrieval via machine-readable dictionaries and machine translation

Abstract

In Cross-Language Information Retrieval (CLIR), queries in one language retrieve relevant documents in other languages Machine-Readable Dictionary (MRD) and Machine Translation (MT) are important resources for query translation in CLIR. We investigate MT and MRD to Arabic-English CLIR. The translation ambiguity associated with these resources is the key problem. We present three methods of query translation using a bilingual dictionary for Arabic-English CLIR. First, we present the Every-Match (EM) method. This method yields ambiguous translations since many extraneous terms are added to the original query. To disambiguate the query translation, we present the First-Match (FM) method that considers the first match in the dictionary as the candidate term. Finally, we present the Two-Phase (TP) method. We show that good retrieval effectiveness can be achieved without complex resources using the Two-Phase method for Arabic-English CLIR. We also empirically evaluate the effectiveness of the MT-based method using short, medium, and long queries from TREC. The effects of the query length on the quality of the MT-based CLIR are investigated.

Keywords:
Computer science Cross-language information retrieval Natural language processing Artificial intelligence Machine translation Ambiguity Arabic Query expansion Information retrieval Bilingual dictionary Key (lock) Translation (biology) Linguistics

Metrics

41
Cited By
1.32
FWCI (Field Weighted Citation Impact)
20
Refs
0.84
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Web Data Mining and Analysis
Physical Sciences →  Computer Science →  Information Systems
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.