JOURNAL ARTICLE

Sense Unveiled: Enhancing Urdu Corpus for Nuanced Word Sense Disambiguation

Sarfraz BibiSohail AsgharMuhammad Zubair

Year: 2024 Journal:   IEEE Access Vol: 12 Pages: 126329-126343   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Ambiguity in word meanings presents a significant challenge in natural language processing, necessitating robust techniques for Word Sense Disambiguation (WSD). While research in WSD has predominantly focused on widely spoken languages like English and Spanish, less attention has been given to languages such as Urdu. This paper addresses this gap by conducting a thorough examination of existing corpora for WSD in Urdu and presenting the creation of an Enhanced Urdu (EU) corpus specifically tailored for WSD tasks. The analysis encompasses a critical evaluation of the limitations of ULS-WSD-18 Corpus, and justifies the need for a more comprehensive resource. The EU corpus is meticulously curated, comprising 960 words categorized based on their frequency in the corpus into most frequent, moderate, and infrequent words. These words serve as the foundation for constructing sentences utilized in model training and testing. Various similarity coefficients are employed to assess the similarity between the EU corpus and the ULS-WSD-18 Corpus, revealing notable patterns in word occurrences, sense structures, and sentence compositions. The findings underscore the potential of the EU corpus to advance WSD research in Urdu language processing. By providing a comprehensive resource for model development and evaluation, this work contributes to the broader goal of improving language processing tools for Urdu and other underrepresented languages.

Keywords:
Word-sense disambiguation Urdu Computer science Sense (electronics) Natural language processing Word (group theory) Artificial intelligence SemEval Linguistics WordNet Engineering

Metrics

2
Cited By
1.28
FWCI (Field Weighted Citation Impact)
40
Refs
0.77
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Data Quality and Management
Social Sciences →  Decision Sciences →  Management Science and Operations Research

Related Documents

JOURNAL ARTICLE

A word sense disambiguation corpus for Urdu

Ali SaeedRao Muhammad Adeel NawabMark StevensonPaul Rayson

Journal:   Language Resources and Evaluation Year: 2018 Vol: 53 (3)Pages: 397-418
JOURNAL ARTICLE

A Sense Annotated Corpus for All-Words Urdu Word Sense Disambiguation

Ali SaeedRao Muhammad Adeel NawabMark StevensonPaul Rayson

Journal:   ACM Transactions on Asian and Low-Resource Language Information Processing Year: 2019 Vol: 18 (4)Pages: 1-14
JOURNAL ARTICLE

Word sense disambiguation corpus for Kashmiri

Tawseef Ahmad MirAadil Ahmad Lawaye

Journal:   Natural language processing. Year: 2024 Vol: 31 (2)Pages: 631-654
JOURNAL ARTICLE

Urdu word sense disambiguation using machine learning approach

Мuhammad AbidAsad HabibJawad AshrafAbdul Shahid

Journal:   Cluster Computing Year: 2017 Vol: 21 (1)Pages: 515-522
© 2026 ScienceGate Book Chapters — All rights reserved.