JOURNAL ARTICLE

Efficient text document clustering with new similarity measures

R. LakshmiS. Baskar

Year: 2020 Journal:   International Journal of Business Intelligence and Data Mining Vol: 18 (1)Pages: 49-49   Publisher: Inderscience Publishers

Abstract

In this paper, two new similarity measures, namely distance of term frequency-based similarity measure (DTFSM) and presence of common terms-based similarity measure (PCTSM), are proposed to compute the similarity between two documents for improving the effectiveness of text document clustering. The effectiveness of the proposed similarity measures is evaluated on reuters-21578 and WebKB datasets for clustering the documents using K-means and K-means++ clustering algorithms. The results obtained by using the proposed DTFSM and PCTSM are significantly better than other measures for document clustering in terms of accuracy, entropy, recall and F-measure. It is evident that the proposed similarity measures not only improve the effectiveness of the text document clustering, but also reduce the complexity of similarity measures based on the number of required operations during text document clustering.

Keywords:
Cluster analysis Computer science Document clustering Similarity (geometry) Similarity measure Data mining Fuzzy clustering Precision and recall Measure (data warehouse) Correlation clustering Entropy (arrow of time) Artificial intelligence Information retrieval Pattern recognition (psychology)

Metrics

17
Cited By
2.06
FWCI (Field Weighted Citation Impact)
0
Refs
0.89
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Clustering Algorithms Research
Physical Sciences →  Computer Science →  Artificial Intelligence
Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Algorithms and Applications
Physical Sciences →  Engineering →  Control and Systems Engineering

Related Documents

JOURNAL ARTICLE

EFFICIENT TEXT DOCUMENT CLUSTERING WITH NEW SIMILARITY MEASURES

S. BaskarR. Lakshmi

Journal:   International Journal of Business Intelligence and Data Mining Year: 2018 Vol: 1 (1)Pages: 1-1
BOOK-CHAPTER

Analysis of Similarity Measures with WordNet Based Text Document Clustering

N. SandhyaA. Govardhan

Advances in intelligent and soft computing Year: 2011 Pages: 703-714
JOURNAL ARTICLE

An Efficient Technique to Implement Similarity Measures in Text Document Clustering using Artificial Neural Networks Algorithm

K. SelviR. Suresh

Journal:   Research Journal of Applied Sciences Engineering and Technology Year: 2014 Vol: 8 (23)Pages: 2320-2328
© 2026 ScienceGate Book Chapters — All rights reserved.