A Graph-based Text Similarity Measure That Employs Named Entity Information

Leonidas Tsekouras; Iraklis Varlamis; George Giannakopoulos

doi:10.26615/978-954-452-049-6_098

ScienceGate Book Chapters

JOURNAL ARTICLE

A Graph-based Text Similarity Measure That Employs Named Entity Information

Leonidas Tsekouras Iraklis Varlamis George Giannakopoulos

Year: 2017 Pages: 765-771

DOI: 10.26615/978-954-452-049-6_098

Get Full-Text PDF Get Analytical Report

Abstract

Text comparison is an interesting though hard task, with many applications in Natural Language Processing.This work introduces a new text-similarity measure, which employs named-entities' information extracted from the texts and the ngram graphs' model for representing documents.Using OpenCalais as a namedentity recognition service and the JIN-SECT toolkit for constructing and managing n-gram graphs, the text similarity measure is embedded in a text clustering algorithm (k-Means).The evaluation of the produced clusters with various clustering validity metrics shows that the extraction of named entities at a first step can be profitable for the time-performance of similarity measures that are based on the n-gram graph representation without affecting the overall performance of the NLP task.

Keywords:

Computer science Cluster analysis Similarity measure Natural language processing Artificial intelligence Task (project management) Similarity (geometry) Measure (data warehouse) Graph n-gram Information extraction Information retrieval Data mining Language model Theoretical computer science

Metrics

Cited By

0.92

FWCI (Field Weighted Citation Impact)

Refs

0.80

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Graph Neural Networks

Physical Sciences → Computer Science → Artificial Intelligence

A Graph-based Text Similarity Measure That Employs Named Entity Information

Abstract

Metrics

Citation History

Topics

Related Documents

Towards Named-Entity-based Similarity Measures

An Orthographic Similarity Measure for Graph-Based Text Representations

Graph-based Named Entity Information Retrieval from News Articles using Neo4j

Graph-Based Named Entity Linking with Wikipedia

Improvement of Graph based Named Entity Disambiguation