JOURNAL ARTICLE

Sentential Cross-lingual Paraphrase Detection for English-Urdu Language Pair

Iqra MuneerNida WaheedAdnan AshrafRao Muhammad Adeel Nawab

Year: 2025 Journal:   The European Journal on Artificial Intelligence Vol: 38 (3)Pages: 309-329

Abstract

Due to vast digital data collections and paraphrasing tools, researchers have shown growing interest in Cross-lingual Paraphrase Detection (CLPD). Open-access data and tools make paraphrasing easier and detection more challenging. Translation tools further exacerbate the issue by enabling effortless text translation across languages, leading to increased cross-lingual paraphrasing. Most existing CLPD studies focus on European languages, particularly English, while the English-Urdu language pair remains underexplored due to limited standard approaches and benchmark corpora.This study addresses this gap by developing the CLPD Corpus for English-Urdu (CLPD-EU), a gold-standard benchmark corpus at the sentence level. The corpus includes 5,801 sentence pairs, comprising 3,900 paraphrased and 1,901 non-paraphrased instances. Additionally, the study implements classical machine learning methods based on bilingual dictionaries, cross-lingual word embeddings, and transfer learning using sentence transformers.The research further incorporates state-of-the-art Large Language Models (LLMs) such as Mistral and LLaMA, significantly improving detection accuracy. Our proposed Feature Fusion Approach, ‘Comb-ST+BD,’ demonstrates strong performance with an F1 score of 0.739 for the CLPD task. The CLPD-EU corpus will be publicly available to encourage further research in CLPD, especially for under-resourced languages like Urdu.

Keywords:
Paraphrase Urdu Linguistics Natural language processing Artificial intelligence Psychology Computer science Philosophy

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
71
Refs
0.02
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Sentiment Analysis and Opinion Mining
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Cross-Lingual Text Reuse Detection at sentence level for English–Urdu language pair

Iqra MuneerRao Muhammad Adeel Nawab

Journal:   Computer Speech & Language Year: 2022 Vol: 75 Pages: 101381-101381
JOURNAL ARTICLE

Cross-lingual Text Reuse Detection at Document Level for English-Urdu Language Pair

Muhammad SharjeelIqra MuneerSumaira NosheenRao Muhammad Adeel NawabPaul Rayson

Journal:   ACM Transactions on Asian and Low-Resource Language Information Processing Year: 2023 Vol: 22 (6)Pages: 1-22
JOURNAL ARTICLE

Cross-lingual Text Reuse Detection Using Translation Plus Monolingual Analysis for English-Urdu Language Pair

Iqra MuneerRao Muhammad Adeel Nawab

Journal:   ACM Transactions on Asian and Low-Resource Language Information Processing Year: 2021 Vol: 21 (2)Pages: 1-18
JOURNAL ARTICLE

Developing a Cross-lingual Semantic Word Similarity Corpus for English–Urdu Language Pair

Ghazeefa FatimaRao Muhammad Adeel NawabMuhammad Salman KhanAli Saeed

Journal:   ACM Transactions on Asian and Low-Resource Language Information Processing Year: 2021 Vol: 21 (2)Pages: 1-16
JOURNAL ARTICLE

Urdu Sentential Paraphrased Plagiarism Detection Using Large Language Models

Hafiz Rizwan IqbalMuhammad SharjeelJawad ShafiUsama MehmoodAgha Ali Raza

Journal:   ACM Transactions on Asian and Low-Resource Language Information Processing Year: 2025 Vol: 24 (9)Pages: 1-20
© 2026 ScienceGate Book Chapters — All rights reserved.