JOURNAL ARTICLE

A Graph-based Text Similarity Measure That Employs Named Entity Information

Abstract

Text comparison is an interesting though hard task, with many applications in Natural Language Processing.This work introduces a new text-similarity measure, which employs named-entities' information extracted from the texts and the ngram graphs' model for representing documents.Using OpenCalais as a namedentity recognition service and the JIN-SECT toolkit for constructing and managing n-gram graphs, the text similarity measure is embedded in a text clustering algorithm (k-Means).The evaluation of the produced clusters with various clustering validity metrics shows that the extraction of named entities at a first step can be profitable for the time-performance of similarity measures that are based on the n-gram graph representation without affecting the overall performance of the NLP task.

Keywords:
Computer science Cluster analysis Similarity measure Natural language processing Artificial intelligence Task (project management) Similarity (geometry) Measure (data warehouse) Graph n-gram Information extraction Information retrieval Data mining Language model Theoretical computer science

Metrics

9
Cited By
0.92
FWCI (Field Weighted Citation Impact)
20
Refs
0.80
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Graph Neural Networks
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.