Sentential Cross-lingual Paraphrase Detection for English-Urdu Language Pair

Iqra Muneer; Nida Waheed; Adnan Ashraf; Rao Muhammad Adeel Nawab

doi:10.1177/30504554251319446

ScienceGate Book Chapters

JOURNAL ARTICLE

Sentential Cross-lingual Paraphrase Detection for English-Urdu Language Pair

Iqra Muneer Nida Waheed Adnan Ashraf Rao Muhammad Adeel Nawab

Year: 2025 Journal: The European Journal on Artificial Intelligence Vol: 38 (3)Pages: 309-329

DOI: 10.1177/30504554251319446

Get Full-Text PDF Get Analytical Report

Abstract

Due to vast digital data collections and paraphrasing tools, researchers have shown growing interest in Cross-lingual Paraphrase Detection (CLPD). Open-access data and tools make paraphrasing easier and detection more challenging. Translation tools further exacerbate the issue by enabling effortless text translation across languages, leading to increased cross-lingual paraphrasing. Most existing CLPD studies focus on European languages, particularly English, while the English-Urdu language pair remains underexplored due to limited standard approaches and benchmark corpora.This study addresses this gap by developing the CLPD Corpus for English-Urdu (CLPD-EU), a gold-standard benchmark corpus at the sentence level. The corpus includes 5,801 sentence pairs, comprising 3,900 paraphrased and 1,901 non-paraphrased instances. Additionally, the study implements classical machine learning methods based on bilingual dictionaries, cross-lingual word embeddings, and transfer learning using sentence transformers.The research further incorporates state-of-the-art Large Language Models (LLMs) such as Mistral and LLaMA, significantly improving detection accuracy. Our proposed Feature Fusion Approach, ‘Comb-ST+BD,’ demonstrates strong performance with an F1 score of 0.739 for the CLPD task. The CLPD-EU corpus will be publicly available to encourage further research in CLPD, especially for under-resourced languages like Urdu.

Keywords:

Paraphrase Urdu Linguistics Natural language processing Artificial intelligence Psychology Computer science Philosophy

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.02

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Sentiment Analysis and Opinion Mining

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Text Analysis Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Sentential Cross-lingual Paraphrase Detection for English-Urdu Language Pair

Abstract

Metrics

Topics

Related Documents

Cross-Lingual Text Reuse Detection at sentence level for English–Urdu language pair

Cross-lingual Text Reuse Detection at Document Level for English-Urdu Language Pair

Cross-lingual Text Reuse Detection Using Translation Plus Monolingual Analysis for English-Urdu Language Pair

Developing a Cross-lingual Semantic Word Similarity Corpus for English–Urdu Language Pair

Urdu Sentential Paraphrased Plagiarism Detection Using Large Language Models