JOURNAL ARTICLE

Bilingual Word Spectral Clustering for Statistical Machine Translation

Abstract

In this paper, a variant of a spectral clustering algorithm is proposed for bilingual word clustering. The proposed algorithm generates the two sets of clusters for both languages efficiently with high semantic correlation within monolingual clusters, and high translation quality across the clusters between two languages. Each cluster level translation is considered as a bilingual concept, which generalizes words in bilingual clusters. This scheme improves the robustness for statistical machine translation models. Two HMM-based translation models are tested to use these bilingual clusters. Improved perplexity, word alignment accuracy, and translation quality are observed in our experiments.

Keywords:
Machine translation Cluster analysis Translation (biology) Word (group theory) Robustness (evolution) Scheme (mathematics) Spectral clustering Example-based machine translation

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.55
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Clustering Algorithms Research
Physical Sciences →  Computer Science →  Artificial Intelligence
Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.