In this paper, a variant of a spectral clustering algorithm is proposed for bilingual word clustering. The proposed algorithm generates the two sets of clusters for both languages efficiently with high semantic correlation within monolingual clusters, and high translation quality across the clusters between two languages. Each cluster level translation is considered as a bilingual concept, which generalizes words in bilingual clusters. This scheme improves the robustness for statistical machine translation models. Two HMM-based translation models are tested to use these bilingual clusters. Improved perplexity, word alignment accuracy, and translation quality are observed in our experiments.
Bing ZhaoEric P. XingAlex Waibel
Rui WangHai ZhaoSabine PlouxBao‐Liang LuMasao UtiyamaEiichiro Sumita
Christoph SchmidtDavid VilarHermann Ney
Anup Kumar BarmanJumi SarmahSubungshri BasimataryAmitava Nag
Linyu WeiMiao LiLei ChenZhenxin YangKai SunMan Yuan