JOURNAL ARTICLE

Enhancing cross-language information retrieval by an automatic acquisition of bilingual terminology from comparable corpora

Abstract

This paper presents an approach to bilingual lexicon extraction from comparable corpora and evaluations on Cross-Language Information Retrieval. We explore a bi-directional extraction of bilingual terminology primarily from comparable corpora. A combined statistics-based and linguistics-based model to select best translation candidates to phrasal translation is proposed. Evaluations using a large test collection for Japanese-English revealed the proposed combination of bi-directional comparable corpora, bilingual dictionaries and transliteration, augmented with linguistics-based pruning to be highly effective in Cross-Language Information Retrieval.

Keywords:
Computer science Natural language processing Artificial intelligence Cross-language information retrieval Terminology Bilingual dictionary Transliteration Lexicon Clef Information retrieval Machine translation Linguistics

Metrics

10
Cited By
1.53
FWCI (Field Weighted Citation Impact)
6
Refs
0.86
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Lexicography and Language Studies
Social Sciences →  Arts and Humanities →  Language and Linguistics
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.