JOURNAL ARTICLE

Corpus-Based Paraphrase Detection Experiments and Review

Tedo VrbanecAna Meštrović

Year: 2020 Journal:   Information Vol: 11 (5)Pages: 241-241   Publisher: Multidisciplinary Digital Publishing Institute

Abstract

Paraphrase detection is important for a number of applications, including plagiarism detection, authorship attribution, question answering, text summarization, text mining in general, etc. In this paper, we give a performance overview of various types of corpus-based models, especially deep learning (DL) models, with the task of paraphrase detection. We report the results of eight models (LSI, TF-IDF, Word2Vec, Doc2Vec, GloVe, FastText, ELMO, and USE) evaluated on three different public available corpora: Microsoft Research Paraphrase Corpus, Clough and Stevenson and Webis Crowd Paraphrase Corpus 2011. Through a great number of experiments, we decided on the most appropriate approaches for text pre-processing: hyper-parameters, sub-model selection—where they exist (e.g., Skipgram vs. CBOW), distance measures, and semantic similarity/paraphrase detection threshold. Our findings and those of other researchers who have used deep learning models show that DL models are very competitive with traditional state-of-the-art approaches and have potential that should be further developed.

Keywords:

Metrics

30
Cited By
2.20
FWCI (Field Weighted Citation Impact)
55
Refs
0.89
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Authorship Attribution and Profiling
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Turkish Paraphrase Corpus

Şeniz Demirİlknur Durgar El-KahloutErdem ÜnalHamza Kaya

Journal:   Language Resources and Evaluation Year: 2012 Pages: 4087-4091
BOOK-CHAPTER

Cross-lingual Metaphor Paraphrase Detection – Experimental Corpus and Baselines

Martin Víta

Communications in computer and information science Year: 2020 Pages: 345-356
BOOK-CHAPTER

Construction of a Russian Paraphrase Corpus: Unsupervised Paraphrase Extraction

Ekaterina PronozaElena YagunovaAnton Pronoza

Communications in computer and information science Year: 2016 Pages: 146-157
© 2026 ScienceGate Book Chapters — All rights reserved.