Domain-specific keyphrase extraction and near-duplicate article detection based on ontology

Nhon Do; Luong Ho

doi:10.1109/rivf.2015.7049886

ScienceGate Book Chapters

JOURNAL ARTICLE

Domain-specific keyphrase extraction and near-duplicate article detection based on ontology

Nhon Do Luong Ho

Year: 2015 Vol: 3 Pages: 123-126

DOI: 10.1109/rivf.2015.7049886

Get Full-Text PDF Get Analytical Report

Abstract

The significant increase in number of the online newspapers has given web users a giant information source. The users are really difficult to manage content as well as check the correctness of articles. In this paper, we introduce algorithms of extracting keyphrase and matching signatures for near-duplicate articles detection. Based on ontology, keyphrases of articles are extracted automatically and similarity of two articles is calculated by using extracted keyphrases. Algorithms are applied on Vietnamese online newspapers for Labor & Employment. Experimental results show that our proposed methods are effective.

Keywords:

Computer science Correctness Information retrieval Domain (mathematical analysis) Newspaper Similarity (geometry) Ontology Matching (statistics) Natural language processing Artificial intelligence Image (mathematics) Algorithm

Metrics

Cited By

0.31

FWCI (Field Weighted Citation Impact)

Refs

0.79

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Text Analysis Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Web Data Mining and Analysis

Physical Sciences → Computer Science → Information Systems

Semantic Web and Ontologies

Physical Sciences → Computer Science → Artificial Intelligence

Domain-specific keyphrase extraction and near-duplicate article detection based on ontology

Abstract

Metrics

Citation History

Topics

Related Documents

Domain-specific keyphrase extraction

Software Keyphrase Extraction with Domain-Specific Features

Keyphrase extraction for Islamic Knowledge ontology

Automatic Domain-specific Term Extraction in Administrative-domain Ontology

DIKEA: Domain-Independent Keyphrase Extraction Algorithm