Learning Tibetan-Chinese Cross-Lingual Word Embeddings

Wei Ma; Hongzhi Yu; Kun Zhao; Deshun Zhao

doi:10.1109/skg49510.2019.00017

ScienceGate Book Chapters

JOURNAL ARTICLE

Learning Tibetan-Chinese Cross-Lingual Word Embeddings

Wei Ma Hongzhi Yu Kun Zhao Deshun Zhao

Year: 2019 Vol: 29 Pages: 49-53

DOI: 10.1109/skg49510.2019.00017

Get Full-Text PDF Get Analytical Report

Abstract

The idea of Word Embedding is based on the semantic distribution hypothesis of the linguist Harris (1954), who believes that words of the same semantics are distributed in similar contexts. Learning of vector-space word embeddings is a technique of central importance in natural language processing. In recent years, cross-lingual word vectors have received more and more attention. Cross-lingual word vectors enable knowledge transfer between different languages, the most important It is this transfer that can take place between resource-rich and low-resource languages. This paper uses Tibetan and Chinese Wikipedia corpus to train monolingual word vectors, mainly using the fastText word vector training method, and the two monolingual word vectors are analyzed by CCA correlation, thus obtaining Tibetan-Chinese cross-lingual word vectors. In the experiment, we evaluated the resulting word representations on standard lexical semantic evaluation tasks and the results show that this method has a certain improvement on the semantic representation of the word vector.

Keywords:

Word (group theory) Computer science Natural language processing Artificial intelligence Word embedding Semantics (computer science) Vector space Distributional semantics Embedding Semantic similarity Linguistics Mathematics

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.21

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Learning Tibetan-Chinese Cross-Lingual Word Embeddings

Abstract

Metrics

Topics

Related Documents

Tibetan-Chinese cross-lingual word embeddings based on MUSE

Tibetan Location Name Recognition Using Tibetan-Chinese Cross-Lingual Word Embeddings

Cross-Lingual Word Embeddings

Cross-Lingual Word Embeddings

Adversarial Learning for Cross-Lingual Word Embeddings