JOURNAL ARTICLE

Word Embeddings in Low Resource Gujarati Language

Abstract

Word embeddings/vectors are becoming an extremely important component of natural language processing tasks. Word2vec and fastText are few of the most common word embedding techniques. While large amount of work has been done to obtain embeddings in resource rich languages like English, work still remains to be done for low resource languages. Our focus has been to develop word vectors for one such low resource language, Gujarati which is spoken in western part of India. We also developed analogy test data set to evaluate the accuracy of the embeddings obtained. We also compared the performance of the models with the pre trained Gujarati models already available.

Keywords:
Gujarati Word2vec Computer science Natural language processing Word (group theory) Artificial intelligence Focus (optics) Resource (disambiguation) Analogy Component (thermodynamics) Embedding Linguistics

Metrics

9
Cited By
0.92
FWCI (Field Weighted Citation Impact)
17
Refs
0.81
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.