JOURNAL ARTICLE

Learning Word Embeddings using Lexical Resources and Corpora

Stanković, RankaRađenović, JovanaŠkorić, MihailoPutnikovic, Marko

Year: 2025 Journal:   Zenodo (CERN European Organization for Nuclear Research)   Publisher: European Organization for Nuclear Research

Abstract

Learning word embeddings on large unlabeled corpora has proven effective for many natural language tasks,. However, these representations can be further improved by incorporating external lexical resources. Previous research has demonstrated that lexical vector representation (embeddings; e.g. dic2vec) trained on both text and lexical data (e.g., WordNet and/or monolingual dictionaries) give improved results for English. Many Serbian Wordnet and Serbian electronic dictionaries present on the Web enable testing this approach for Serbian within this project. In this paper, we adapt the original dict2vec project for Serbian language resources. We present the textual, lexical, and vector resources prepared and used for training and evaluation, describe the training pipeline and discuss preliminary evaluation results. We conclude this paper by outlining ongoing work and future steps.

Keywords:
Serbian WordNet Lexical database Word (group theory) Pipeline (software) Representation (politics) Natural language Lexical item

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.23
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Lexicography and Language Studies
Social Sciences →  Arts and Humanities →  Language and Linguistics
© 2026 ScienceGate Book Chapters — All rights reserved.