Word Embeddings in Low Resource Gujarati Language

Ishani Joshi; Purvi A. Koringa; Suman K. Mitra

doi:10.1109/icdarw.2019.40090

ScienceGate Book Chapters

JOURNAL ARTICLE

Word Embeddings in Low Resource Gujarati Language

Ishani Joshi Purvi A. Koringa Suman K. Mitra

Year: 2019 Pages: 110-115

DOI: 10.1109/icdarw.2019.40090

Get Full-Text PDF Get Analytical Report

Abstract

Word embeddings/vectors are becoming an extremely important component of natural language processing tasks. Word2vec and fastText are few of the most common word embedding techniques. While large amount of work has been done to obtain embeddings in resource rich languages like English, work still remains to be done for low resource languages. Our focus has been to develop word vectors for one such low resource language, Gujarati which is spoken in western part of India. We also developed analogy test data set to evaluate the accuracy of the embeddings obtained. We also compared the performance of the models with the pre trained Gujarati models already available.

Keywords:

Gujarati Word2vec Computer science Natural language processing Word (group theory) Artificial intelligence Focus (optics) Resource (disambiguation) Analogy Component (thermodynamics) Embedding Linguistics

Metrics

Cited By

0.92

FWCI (Field Weighted Citation Impact)

Refs

0.81

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Word Embeddings in Low Resource Gujarati Language

Abstract

Metrics

Citation History

Topics

Related Documents

Cross-Lingual Word Embeddings for Low-Resource Language Modeling

Creating Welsh Language Word Embeddings

Supervised Bilingual Word Embeddings for Low-Resource Language Pairs: Myanmar and Thai

Word Embeddings for the Polish Language

Word Embeddings for Natural Language Processing