Cross-Lingual English–Urdu Semantic Word Similarity Using Sentence Transformers

Iqra Muneer; Ali Saeed; Rao Muhammad Adeel Nawab

doi:10.1177/30504554241297614

ScienceGate Book Chapters

JOURNAL ARTICLE

Cross-Lingual English–Urdu Semantic Word Similarity Using Sentence Transformers

Iqra Muneer Ali Saeed Rao Muhammad Adeel Nawab

Year: 2025 Journal: The European Journal on Artificial Intelligence Vol: 38 (1)Pages: 21-34

DOI: 10.1177/30504554241297614

Get Full-Text PDF Get Analytical Report

Abstract

Semantic word similarity is a quantitative method of determining how much two terms are contextually identical, which is a considerable challenge for computational linguistics. The research community has examined a range of approaches to address this issue. However, most of these approaches are for a comparatively limited set of languages, especially English. Research on semantic word similarity for South Asian languages, particularly Urdu, is immature. In recent years, transformer-based approaches have proved extremely successful for a range of language processing tasks. The primary aim of this study is to develop and compare a variety of transformer-based approaches to the cross-lingual English–Urdu semantic word similarity task. This study evaluated a publicly available benchmark USWS-19 corpus that comprises 518 word pairs. This study mainly explored four types of transformer-based approaches: (a) cross-lingual sentence transformer-based approaches using the original dataset, (b) cross-lingual sentence transformer-based approaches using the translated dataset (translation plus monolingual analysis [T+MA] approach), (c) the feature fusion approach (mixture of features), and large language models. In addition, this study also explores the word embedding-based approach using the translated dataset (T+MA approach). In total, this study developed 29 transformer-based models, with the highest results (Pearson correlation = 0.788) achieved using a feature fusion approach, that is, Best-Two-SBERT (where SBERT stands for sentence-bidirectional encoder representations from transformers; using T+MA) + BEST Baseline (with Bing translator) + Best cross-lingual SBERT. This approach improved by 7% over previously reported results on the same corpus.

Keywords:

Urdu Natural language processing Computer science Semantic similarity Sentence Artificial intelligence Transformer Linguistics Physics Philosophy

Metrics

Cited By

4.82

FWCI (Field Weighted Citation Impact)

Refs

0.91

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Text Analysis Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Cross-Lingual English–Urdu Semantic Word Similarity Using Sentence Transformers

Abstract

Metrics

Citation History

Topics

Related Documents

Developing a Cross-lingual Semantic Word Similarity Corpus for English–Urdu Language Pair

English-Vietnamese Cross-lingual Semantic Textual Similarity using Sentence Transformer model

Evaluating Cross-Lingual Semantic Textual Similarity Using BERT-Based Sentence Embeddings

Advanced Semantic Text Similarity Analysis Using Sentence Transformers

Detecting Cross-lingual Semantic Similarity Using Parallel PropBanks.