JOURNAL ARTICLE

Statistical Extraction and Comparison of Pivot Words for Bilingual Lexicon Extension

Daniel AndradeTakuya MatsuzakiJun’ichi Tsujii

Year: 2012 Journal:   ACM Transactions on Asian Language Information Processing Vol: 11 (2)Pages: 1-31   Publisher: Association for Computing Machinery

Abstract

Bilingual dictionaries can be automatically extended by new translations using comparable corpora. The general idea is based on the assumption that similar words have similar contexts across languages. However, previous studies have mainly focused on Indo-European languages, or use only a bag-of-words model to describe the context. Furthermore, we argue that it is helpful to extract only the statistically significant context, instead of using all context. The present approach addresses these issues in the following manner. First, based on the context of a word with an unknown translation (query word), we extract salient pivot words. Pivot words are words for which a translation is already available in a bilingual dictionary. For the extraction of salient pivot words, we use a Bayesian estimation of the point-wise mutual information to measure statistical significance. In the second step, we match these pivot words across languages to identify translation candidates for the query word. We therefore calculate a similarity score between the query word and a translation candidate using the probability that the same pivots will be extracted for both the query word and the translation candidate. The proposed method uses several context positions, namely, a bag-of-words of one sentence, and the successors, predecessors, and siblings with respect to the dependency parse tree of the sentence. In order to make these context positions comparable across Japanese and English, which are unrelated languages, we use several heuristics to adjust the dependency trees appropriately. We demonstrate that the proposed method significantly increases the accuracy of word translations, as compared to previous methods.

Keywords:
Computer science Natural language processing Artificial intelligence Word (group theory) Context (archaeology) Machine translation Sentence Dependency (UML) Bilingual dictionary Parsing Linguistics

Metrics

6
Cited By
1.14
FWCI (Field Weighted Citation Impact)
33
Refs
0.83
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Mathematics, Computing, and Information Processing
Physical Sciences →  Computer Science →  Computational Theory and Mathematics

Related Documents

JOURNAL ARTICLE

Extended pivot-based approach for bilingual lexicon extraction

Hyeong-Won SeoHong-Seok KwonJae‐Hoon Kim

Journal:   Han-guk marin enjinieoring hakoeji Year: 2014 Vol: 38 (5)Pages: 557-565
JOURNAL ARTICLE

Evaluating a Pivot-Based Approach for Bilingual Lexicon Extraction

Jae‐Hoon KimHong-Seok KwonHyeong-Won Seo

Journal:   Computational Intelligence and Neuroscience Year: 2015 Vol: 2015 Pages: 1-13
BOOK-CHAPTER

A statistical view on bilingual lexicon extraction

Pascale Fung

Text, speech and language technology Year: 2000 Pages: 219-236
BOOK-CHAPTER

Bilingual Lexicon Extraction

Claude SammutGeoffrey I. Webb

Encyclopedia of Machine Learning and Data Mining Year: 2017 Pages: 140-140
BOOK-CHAPTER

Bilingual Lexicon Extraction

Encyclopedia of Machine Learning Year: 2010 Pages: 111-111
© 2026 ScienceGate Book Chapters — All rights reserved.