An Unsupervised Approach of Paraphrase Discovery from Large Crime Corpus

Priyanka Das; Asit Kumar Das

doi:10.1109/iccci.2018.8441265

ScienceGate Book Chapters

JOURNAL ARTICLE

An Unsupervised Approach of Paraphrase Discovery from Large Crime Corpus

Priyanka Das Asit Kumar Das

Year: 2018 Pages: 1-6

DOI: 10.1109/iccci.2018.8441265

Get Full-Text PDF Get Analytical Report

Abstract

Massive crime reports often comprises valuable structured information regarding crime pattern but manual processing of these massive dataset is quite strenuous and error-prone. These huge dataset can be best exploited by identifying relational clusters of named entities from the crime reports. But often the clusters contain phrases not defining the same relation as the relational characterisation of the whole cluster. Therefore, paraphrasing is performed to filter out those phrases not defining the same relation. Paraphrases are mostly the phrases that reflect the same context in different articulations. Discovering paraphrases from a large corpus is a demanding task for various applications of natural language processing and researchers have been working on it since long time. But none have taken an attempt to perform the paraphrasing task on crime data. In order to deal with the perplexity of the phrases, the present work proposes an unsupervised approach for recognising synonymous phrases or paraphrases from an untagged crime corpus. This work mainly emphasises on the sentences that comprises two entities and each entity pair from different domain is represented as shallow parse tree. The head word from each parsing tree depicts the actual meaning of the phrase and all the phrases with the same headword have been accumulated for each domain of entity pairs. However, many phrases exist that reflects the same meaning without sharing the same headword. So, the objective is to cluster these phrases defining the same meaning by using an agglomerative hierarchical clustering technique. The method presented in this work is an unsupervised approach and it does not need any kind of training samples to work with.

Keywords:

Computer science Natural language processing Artificial intelligence Perplexity Phrase Paraphrase Parsing Relation (database) Noun phrase Meaning (existential) Context (archaeology) Parse tree Task (project management) Cluster analysis Domain (mathematical analysis) Word (group theory) Linguistics Language model Noun Psychology Data mining

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.09

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Text Analysis Techniques

Physical Sciences → Computer Science → Artificial Intelligence

An Unsupervised Approach of Paraphrase Discovery from Large Crime Corpus

Abstract

Metrics

Topics

Related Documents

Construction of a Russian Paraphrase Corpus: Unsupervised Paraphrase Extraction

Unsupervised construction of large paraphrase corpora

Synonym Discovery from Large Corpus

Pivot Discrimination Approach for Paraphrase Extraction from Bilingual Corpus

ExaPPC: a Large-Scale Persian Paraphrase Detection Corpus