Word embeddings/vectors are becoming an extremely important component of natural language processing tasks. Word2vec and fastText are few of the most common word embedding techniques. While large amount of work has been done to obtain embeddings in resource rich languages like English, work still remains to be done for low resource languages. Our focus has been to develop word vectors for one such low resource language, Gujarati which is spoken in western part of India. We also developed analogy test data set to evaluate the accuracy of the embeddings obtained. We also compared the performance of the models with the pre trained Gujarati models already available.
Oliver AdamsAdam J. MakaruchaGraham NeubigSteven BirdTrevor Cohn
Padraig CorcoranGeraint PalmerLaura ArmanDawn KnightIrena Spasić
Marek RogalskiPiotr S. Szczepaniak