JOURNAL ARTICLE

Cross-Lingual English–Urdu Semantic Word Similarity Using Sentence Transformers

Iqra MuneerAli SaeedRao Muhammad Adeel Nawab

Year: 2025 Journal:   The European Journal on Artificial Intelligence Vol: 38 (1)Pages: 21-34

Abstract

Semantic word similarity is a quantitative method of determining how much two terms are contextually identical, which is a considerable challenge for computational linguistics. The research community has examined a range of approaches to address this issue. However, most of these approaches are for a comparatively limited set of languages, especially English. Research on semantic word similarity for South Asian languages, particularly Urdu, is immature. In recent years, transformer-based approaches have proved extremely successful for a range of language processing tasks. The primary aim of this study is to develop and compare a variety of transformer-based approaches to the cross-lingual English–Urdu semantic word similarity task. This study evaluated a publicly available benchmark USWS-19 corpus that comprises 518 word pairs. This study mainly explored four types of transformer-based approaches: (a) cross-lingual sentence transformer-based approaches using the original dataset, (b) cross-lingual sentence transformer-based approaches using the translated dataset (translation plus monolingual analysis [T+MA] approach), (c) the feature fusion approach (mixture of features), and large language models. In addition, this study also explores the word embedding-based approach using the translated dataset (T+MA approach). In total, this study developed 29 transformer-based models, with the highest results (Pearson correlation = 0.788) achieved using a feature fusion approach, that is, Best-Two-SBERT (where SBERT stands for sentence-bidirectional encoder representations from transformers; using T+MA) + BEST Baseline (with Bing translator) + Best cross-lingual SBERT. This approach improved by 7% over previously reported results on the same corpus.

Keywords:
Urdu Natural language processing Computer science Semantic similarity Sentence Artificial intelligence Transformer Linguistics Physics Philosophy

Metrics

1
Cited By
4.82
FWCI (Field Weighted Citation Impact)
61
Refs
0.91
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Developing a Cross-lingual Semantic Word Similarity Corpus for English–Urdu Language Pair

Ghazeefa FatimaRao Muhammad Adeel NawabMuhammad Salman KhanAli Saeed

Journal:   ACM Transactions on Asian and Low-Resource Language Information Processing Year: 2021 Vol: 21 (2)Pages: 1-16
BOOK-CHAPTER

Advanced Semantic Text Similarity Analysis Using Sentence Transformers

Dhawaleswar RaoPrajna Pani

Lecture notes in networks and systems Year: 2025 Pages: 371-379
JOURNAL ARTICLE

Detecting Cross-lingual Semantic Similarity Using Parallel PropBanks.

Shumin WuJinho D. ChoiMartha Palmer

Journal:   Conference of the Association for Machine Translation in the Americas Year: 2010 Vol: 349 Pages: g4456-g4456
© 2026 ScienceGate Book Chapters — All rights reserved.