JOURNAL ARTICLE

Word sense disambiguation corpus for Kashmiri

Tawseef Ahmad MirAadil Ahmad Lawaye

Year: 2024 Journal:   Natural language processing. Vol: 31 (2)Pages: 631-654   Publisher: Cambridge University Press

Abstract

Abstract Ambiguity is considered an indispensable attribute of all natural languages. The process of associating the precise interpretation to an ambiguous word taking into consideration the context in which it occurs is known as word sense disambiguation (WSD). Supervised approaches to WSD are showing better performance in contrast to their counterparts. These approaches, however, require sense annotated corpus to carry out the disambiguation process. This paper presents the first-ever standard WSD dataset for the Kashmiri language. The raw corpus used to develop the sense annotated dataset is collected from different resources and contains about 1 M tokens. The sense-annotated corpus is then created using this raw corpus for 124 commonly used ambiguous Kashmiri words. Kashmiri WordNet, an important lexical resource for the Kashmiri language, is used for obtaining the senses used in the annotation process. The developed sense-tagged corpus is multifarious in nature and has 19,854 sentences. Based on this annotated corpus, the Lexical Sample WSD task for Kashmiri is carried out using different machine-learning algorithms (J48, IBk, Naive Bayes, Dl4jMlpClassifier, SVM). To train these models for the WSD task, bag-of-words (BoW) and word embeddings obtained using the Word2Vec model are used. We used different standard measures, viz. accuracy, precision, recall, and F1-measure, to calculate the performance of these algorithms. Different machine learning algorithms reported different values for these measures on using different features. In the case of BoW model, SVM reported better results than other algorithms used, whereas Dl4jMlpClassifier performed better with word embeddings.

Keywords:
Kashmiri Word-sense disambiguation SemEval Computer science Word (group theory) Natural language processing Artificial intelligence Linguistics WordNet Medicine Engineering

Metrics

3
Cited By
11.94
FWCI (Field Weighted Citation Impact)
0
Refs
0.97
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

South Asian Studies and Conflicts
Social Sciences →  Social Sciences →  Political Science and International Relations
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Multilingual Education and Policy
Social Sciences →  Social Sciences →  Linguistics and Language

Related Documents

JOURNAL ARTICLE

Building Kashmiri Sense Annotated Corpus and its Usage in Supervised Word Sense Disambiguation

Tawseef Ahmad MirAadil Ahmad LawayeParveen RanaGhayas Ahmed

Journal:   Indian Journal of Science and Technology Year: 2023 Vol: 16 (13)Pages: 1021-1029
BOOK-CHAPTER

Machine Learning Approach for Kashmiri Word Sense Disambiguation

Aadil Ahmad LawayeTawseef Ahmad MirMahmood Hussain MirGhayas Ahmed

Advances in computational intelligence and robotics book series Year: 2024 Pages: 113-136
JOURNAL ARTICLE

Towards Developing Word Sense Disambiguation System for Kashmiri Language

Tawseef Ahmad MirAadil Ahmad Lawaye

Journal:   SAMRIDDHI A Journal of Physical Sciences Engineering and Technology Year: 2023 Vol: 15 (02)Pages: 234-240
© 2026 ScienceGate Book Chapters — All rights reserved.