Abstract

Cross-lingual word embeddings transfer knowledge between languages: models trained on high-resource languages can predict in low-resource languages. We introduce CLIME, an interactive system to quickly refine cross-lingual word embeddings for a given classification problem. First, CLIME ranks words by their salience to the downstream task. Then, users mark similarity between keywords and their nearest neighbors in the embedding space. Finally, CLIME updates the embeddings using the annotations. We evaluate CLIME on identifying health-related text in four low-resource languages: Ilocano, Sinhalese, Tigrinya, and Uyghur. Embeddings refined by CLIME capture more nuanced word semantics and have higher test accuracy than the original embeddings. CLIME often improves accuracy faster than an active learning baseline and can be easily combined with active learning to improve results.

Keywords:
Computer science Natural language processing Artificial intelligence Word (group theory) Salience (neuroscience) Word embedding Semantics (computer science) Distributional semantics Embedding Semantic similarity Linguistics Programming language

Metrics

32
Cited By
4.11
FWCI (Field Weighted Citation Impact)
61
Refs
0.94
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Text Readability and Simplification
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

BOOK-CHAPTER

Refinement of Unsupervised Cross-Lingual Word Embeddings

Magdalena BiesialskaMarta R. Costa‐jussà

Frontiers in artificial intelligence and applications Year: 2020
BOOK

Cross-Lingual Word Embeddings

Anders SøgaardIvan VulićSebastian RuderManaal Faruqui

Synthesis lectures on human language technologies Year: 2019
JOURNAL ARTICLE

Cross-Lingual Word Embeddings

Anders SøgaardIvan VulićSebastian RuderManaal Faruqui

Journal:   Synthesis lectures on human language technologies Year: 2019 Vol: 12 (2)Pages: 1-132
DISSERTATION

Adversarial Learning for Cross-Lingual Word Embeddings

Wang, Haozhou

University:   Archive ouverte UNIGE (University of Geneva) Year: 2024
© 2026 ScienceGate Book Chapters — All rights reserved.