JOURNAL ARTICLE

A frequent term based text clustering approach using novel similarity measure

Abstract

Text clustering is an unsupervised process forming its basis solely on finding the similarity relationship between documents with the output as a set of clusters [14]. In this research, a commonality measure is defined to find commonality between two text files which is used as a similarity measure. The main idea is to apply any existing frequent item finding algorithm such as apriori or fp-tree to the initial set of text files to reduce the dimension of the input text files. A document feature vector is formed for all the documents. Then a vector is formed for all the static text input files. The algorithm outputs a set of clusters from the initial input of text files considered.

Keywords:
Computer science Cluster analysis Similarity (geometry) Set (abstract data type) Similarity measure Measure (data warehouse) Term (time) Document clustering Data mining Tree (set theory) Dimension (graph theory) A priori and a posteriori Pattern recognition (psychology) Artificial intelligence Information retrieval Mathematics Combinatorics

Metrics

17
Cited By
5.65
FWCI (Field Weighted Citation Impact)
18
Refs
0.96
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Data Mining Algorithms and Applications
Physical Sciences →  Computer Science →  Information Systems
Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Algorithms and Data Compression
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.