Stanković, RankaRađenović, JovanaŠkorić, MihailoPutnikovic, Marko
Learning word embeddings on large unlabeled corpora has proven effective for many natural language tasks,. However, these representations can be further improved by incorporating external lexical resources. Previous research has demonstrated that lexical vector representation (embeddings; e.g. dic2vec) trained on both text and lexical data (e.g., WordNet and/or monolingual dictionaries) give improved results for English. Many Serbian Wordnet and Serbian electronic dictionaries present on the Web enable testing this approach for Serbian within this project. In this paper, we adapt the original dict2vec project for Serbian language resources. We present the textual, lexical, and vector resources prepared and used for training and evaluation, describe the training pipeline and discuss preliminary evaluation results. We conclude this paper by outlining ongoing work and future steps.
Stanković, RankaRađenović, JovanaŠkorić, MihailoPutnikovic, Marko
Julien TissierChristopher GravierAmaury Habrard
Luchen TanHaotian ZhangCharles L. A. ClarkeMark D. Smucker
Long DuongHiroshi KanayamaTengfei MaSteven BirdTrevor Cohn
Arturo Hernández-MirandaAlexander GelbukhOlga Kolesnikova