JOURNAL ARTICLE

Comparision of Different Distance Measure Methods in Text Document Clustering

Yin Min Tun

Year: 2018 Journal:   INTERNATIONAL JOURNAL OF RESEARCH AND ENGINEERING Vol: 5 (7)

Abstract

Clustering text document is an unsupervised learning method to find common groups. The clustering of text documents are the special issue in text mining for unlabeled train documents. Fortunately, there are many proposed features and methods to resolve this problem. The framework of text document classification consists of: input text document, preprocessing, feature extraction and clustering. The common classification methods are: self-organization map, k-means and mixture of Gaussians. The correlation of resulted clusters is based on selecting a distance measure method. The main focus of this paper is to present different exiting distance measure methods along with k-means clustering for text document clustering. The experiment performed k-means clustering on the Newsgroups dataset and measure clustering entropy to evaluate the different distance measure methods.

Keywords:
Cluster analysis Document clustering Computer science Preprocessor Artificial intelligence Measure (data warehouse) Correlation clustering Consensus clustering Pattern recognition (psychology) Fuzzy clustering Data mining Clustering high-dimensional data Entropy (arrow of time) Brown clustering CURE data clustering algorithm Physics

Metrics

2
Cited By
0.20
FWCI (Field Weighted Citation Impact)
0
Refs
0.58
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Computational Techniques and Applications
Physical Sciences →  Computer Science →  Artificial Intelligence
Data Mining Algorithms and Applications
Physical Sciences →  Computer Science →  Information Systems
Advanced Clustering Algorithms Research
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Design And Analysis Of Text Document Clustering Using Meta-Heuristic Distance Based Methods

R. Kumaresan

Journal:   Journal of Advanced Research in Dynamic and Control Systems Year: 2020 Vol: 12 (6)Pages: 2262-2269
JOURNAL ARTICLE

Short Text Document Clustering using Distributed Word Representation and Document Distance

Supavit KongwudhikunakornKitsana Waiyamai

Journal:   Walailak Journal of Science and Technology (WJST) Year: 2018 Vol: 16 (2)Pages: 107-119
JOURNAL ARTICLE

Combining Distributed Word Representation and Document Distance for Short Text Document Clustering

Supavit KongwudhikunakornKitsana Waiyamai

Journal:   Journal of Information Processing Systems Year: 2020 Vol: 16 (2)Pages: 277-300
© 2026 ScienceGate Book Chapters — All rights reserved.