JOURNAL ARTICLE

Learning Tibetan-Chinese Cross-Lingual Word Embeddings

Abstract

The idea of Word Embedding is based on the semantic distribution hypothesis of the linguist Harris (1954), who believes that words of the same semantics are distributed in similar contexts. Learning of vector-space word embeddings is a technique of central importance in natural language processing. In recent years, cross-lingual word vectors have received more and more attention. Cross-lingual word vectors enable knowledge transfer between different languages, the most important It is this transfer that can take place between resource-rich and low-resource languages. This paper uses Tibetan and Chinese Wikipedia corpus to train monolingual word vectors, mainly using the fastText word vector training method, and the two monolingual word vectors are analyzed by CCA correlation, thus obtaining Tibetan-Chinese cross-lingual word vectors. In the experiment, we evaluated the resulting word representations on standard lexical semantic evaluation tasks and the results show that this method has a certain improvement on the semantic representation of the word vector.

Keywords:
Word (group theory) Computer science Natural language processing Artificial intelligence Word embedding Semantics (computer science) Vector space Distributional semantics Embedding Semantic similarity Linguistics Mathematics

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
14
Refs
0.21
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Tibetan-Chinese cross-lingual word embeddings based on MUSE

Wei MaHongzhi YuKun ZhaoDeshun ZhaoJun Yang

Journal:   Journal of Physics Conference Series Year: 2020 Vol: 1453 (1)Pages: 012043-012043
BOOK

Cross-Lingual Word Embeddings

Anders SøgaardIvan VulićSebastian RuderManaal Faruqui

Synthesis lectures on human language technologies Year: 2019
JOURNAL ARTICLE

Cross-Lingual Word Embeddings

Anders SøgaardIvan VulićSebastian RuderManaal Faruqui

Journal:   Synthesis lectures on human language technologies Year: 2019 Vol: 12 (2)Pages: 1-132
DISSERTATION

Adversarial Learning for Cross-Lingual Word Embeddings

Wang, Haozhou

University:   Archive ouverte UNIGE (University of Geneva) Year: 2024
© 2026 ScienceGate Book Chapters — All rights reserved.