JOURNAL ARTICLE

An Unsupervised Approach of Paraphrase Discovery from Large Crime Corpus

Abstract

Massive crime reports often comprises valuable structured information regarding crime pattern but manual processing of these massive dataset is quite strenuous and error-prone. These huge dataset can be best exploited by identifying relational clusters of named entities from the crime reports. But often the clusters contain phrases not defining the same relation as the relational characterisation of the whole cluster. Therefore, paraphrasing is performed to filter out those phrases not defining the same relation. Paraphrases are mostly the phrases that reflect the same context in different articulations. Discovering paraphrases from a large corpus is a demanding task for various applications of natural language processing and researchers have been working on it since long time. But none have taken an attempt to perform the paraphrasing task on crime data. In order to deal with the perplexity of the phrases, the present work proposes an unsupervised approach for recognising synonymous phrases or paraphrases from an untagged crime corpus. This work mainly emphasises on the sentences that comprises two entities and each entity pair from different domain is represented as shallow parse tree. The head word from each parsing tree depicts the actual meaning of the phrase and all the phrases with the same headword have been accumulated for each domain of entity pairs. However, many phrases exist that reflects the same meaning without sharing the same headword. So, the objective is to cluster these phrases defining the same meaning by using an agglomerative hierarchical clustering technique. The method presented in this work is an unsupervised approach and it does not need any kind of training samples to work with.

Keywords:
Computer science Natural language processing Artificial intelligence Perplexity Phrase Paraphrase Parsing Relation (database) Noun phrase Meaning (existential) Context (archaeology) Parse tree Task (project management) Cluster analysis Domain (mathematical analysis) Word (group theory) Linguistics Language model Noun Psychology Data mining

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
21
Refs
0.09
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

BOOK-CHAPTER

Construction of a Russian Paraphrase Corpus: Unsupervised Paraphrase Extraction

Ekaterina PronozaElena YagunovaAnton Pronoza

Communications in computer and information science Year: 2016 Pages: 146-157
BOOK-CHAPTER

Synonym Discovery from Large Corpus

Meng Qu

Synthesis lectures on data mining and knowledge discovery Year: 2018 Pages: 75-84
JOURNAL ARTICLE

Pivot Discrimination Approach for Paraphrase Extraction from Bilingual Corpus

박에스더임해창Min Joung Kim이형규

Journal:   Korean Journal of Cognitive Science Year: 2011 Vol: 22 (1)Pages: 57-78
© 2026 ScienceGate Book Chapters — All rights reserved.