JOURNAL ARTICLE

A Novel Unsupervised Approach for Cross-Lingual Word Alignment in Low Isomorphic Embedding Spaces

Qian TaoZhihao XiongBocheng HanXiaoyang FanLusi Li

Year: 2023 Journal:   IEEE/ACM Transactions on Audio Speech and Language Processing Vol: 31 Pages: 3027-3041   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Cross-lingual word alignment is the task for word translation between monolingual word embedding spaces of two different languages. Recent work is mostly based on supervised approaches, while their success relies on bilingual seed dictionaries derived from aligned data. The unsupervised adversarial approaches, which utilize generative adversarial networks (GANs) to map the global monolingual space to another space, can eliminate the need for aligned data. However, most GAN-based unsupervised approaches ignore the issues of mode collapse and gradient disappearance in GANs, leading to a training failure to converge. In addition, these approaches often fail to account for the low isomorphism between language pairs, which prevents capturing the non-linear relationship contained in cross-lingual embedding spaces. To address these issues, we propose a novel unsupervised unified framework with an adaptive training objective for the GANs' improvement (ATOGAN) and a local mapping (LM) strategy for exploring the non-linear relationship. We present ATOGAN to learn bi-directional global mapping using unaligned word embeddings, which integrates particle swarm optimization (PSO) to adaptively select the training objective for preventing mode collapse and gradient disappearance. Then, we design an LM strategy based on the guidance of dictionaries generated by trained ATOGAN to alleviate reliance on isomorphism assumption for purely linear mapping. Experimental results demonstrate the effectiveness of our proposed method for cross-lingual word alignment in low isomorphic embedding spaces (distant language pairs). Our code is available at https://github.com/goFurtherLong/ATOGAN .

Keywords:
Computer science Embedding Word (group theory) Word embedding Artificial intelligence Natural language processing Generative grammar Translation (biology) Space (punctuation) Linear space Code (set theory) Mathematics Discrete mathematics

Metrics

3
Cited By
0.77
FWCI (Field Weighted Citation Impact)
54
Refs
0.72
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
© 2026 ScienceGate Book Chapters — All rights reserved.