JOURNAL ARTICLE

Learning bilingual translations from comparable corpora to cross-language information retrieval

Abstract

Recent years saw an increased interest in the use and the construction of large corpora.With this increased interest and awareness has come an expansion in the application to knowledge acquisition and bilingual terminology extraction.The present paper will seek to present an approach to bilingual lexicon extraction from non-aligned comparable corpora, combination to linguisticsbased pruning and evaluations on Cross-Language Information Retrieval.We propose and explore a two-stages translation model for the acquisition of bilingual terminology from comparable corpora, disambiguation and selection of best translation alternatives on the basis of their morphological knowledge.Evaluations using a large-scale test collection on Japanese-English and different weighting schemes of SMART retrieval system confirmed the effectiveness of the proposed combination of two-stages comparable corpora and linguistics-based pruning on Cross-Language Information Retrieval.

Keywords:
Computer science Natural language processing Artificial intelligence Terminology Pruning Cross-language information retrieval Text corpus Information extraction Selection (genetic algorithm) Lexicon Information retrieval Machine translation Linguistics

Metrics

27
Cited By
2.68
FWCI (Field Weighted Citation Impact)
24
Refs
0.91
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Biomedical Text Mining and Ontologies
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Molecular Biology
© 2026 ScienceGate Book Chapters — All rights reserved.