Hailong CaoLiguo LiConghui ZhuMuyun YangTiejun Zhao
The word embedding models such as Word2vec and FastText simultaneously learn dual representations of input vectors and output vectors. In contrast, almost all existing unsupervised bilingual lexicon induction (UBLI) methods use only input vectors without utilizing output vectors. In this paper, we propose a novel approach to making full use of both input and output vectors for more robust and strong UBLI. We discover the Common Difference Property that one orthogonal transformation can connect not only the input vectors of two languages but also the output vectors. Therefore, we can learn just one transformation to induce two different dictionaries from the input and output vectors, respectively. Between these two quite different dictionaries, a more accurate lexicon with less noise can be induced by taking the intersection of them in UBLI procedure. Extensive experiments show that our method achieves much more robust and strong results than state-of-the-art methods in distant language pairs, while reserving comparable performances in similar language pairs.
Hailong CaoTiejun ZhaoWeixuan WangWei Peng
Bradley HauerGarrett NicolaiGrzegorz Kondrak
Qian TaoZiyan LiBocheng HanLusi Li
Qiuyu DingHailong CaoConghui ZhuTiejun Zhao