JOURNAL ARTICLE

Learning Word Embeddings using Lexical Resources and Corpora

Stanković, RankaRađenović, JovanaŠkorić, MihailoPutnikovic, Marko

Year: 2025 Journal:   Zenodo (CERN European Organization for Nuclear Research)   Publisher: European Organization for Nuclear Research

Abstract

Learning word embeddings on large unlabeled corpora has proven effective for many natural language tasks,. However, these representations can be further improved by incorporating external lexical resources. Previous research has demonstrated that lexical vector representation (embeddings; e.g. dic2vec) trained on both text and lexical data (e.g., WordNet and/or monolingual dictionaries) give improved results for English. Many Serbian Wordnet and Serbian electronic dictionaries present on the Web enable testing this approach for Serbian within this project. In this paper, we adapt the original dict2vec project for Serbian language resources. We present the textual, lexical, and vector resources prepared and used for training and evaluation, describe the training pipeline and discuss preliminary evaluation results. We conclude this paper by outlining ongoing work and future steps.

Keywords:
Serbian WordNet Lexical database Word (group theory) Pipeline (software) Representation (politics) Natural language Lexical item

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.49
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Computational Drug Discovery Methods
Physical Sciences →  Computer Science →  Computational Theory and Mathematics
Cancer and biochemical research
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Molecular Biology
Pesticide Residue Analysis and Safety
Life Sciences →  Agricultural and Biological Sciences →  Food Science
© 2026 ScienceGate Book Chapters — All rights reserved.