JOURNAL ARTICLE

Research and Application of Improved K-means Algorithm in Text Clustering

Shen-yi QIANHuihui LiuDai-yi LI

Year: 2018 Journal:   DEStech Transactions on Computer Science and Engineering   Publisher: Destech Publications

Abstract

K-means is a commonly used text clustering algorithm, the biggest advantage of the proposed algorithm is simple and fast, but due to the random selection of the initial cluster center point, the K-means algorithm is easy to fall into the local optimal algorithm and instability of the clustering results and the number of iterations. To solve this problem, this paper selected the initial cluster centers using hierarchical agglomerative clustering algorithm, to ensure the high quality of the center point; using cosine similarity to measure the distance between the text; reconstructed calculation formula of cluster center and the objective function of clustering quality. The experimental results show that the improved K-means algorithm has a relatively high accuracy and stability with the Sogou Chinese text corpus as the data set. Introduction

Keywords:
Cluster analysis Computer science Hierarchical clustering Algorithm Single-linkage clustering Cluster (spacecraft) Stability (learning theory) Similarity (geometry) Canopy clustering algorithm Set (abstract data type) k-medoids Point (geometry) CURE data clustering algorithm Center (category theory) k-medians clustering Correlation clustering Complete-linkage clustering Data mining Mathematics Artificial intelligence Machine learning

Metrics

3
Cited By
0.20
FWCI (Field Weighted Citation Impact)
0
Refs
0.58
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Computational Techniques and Applications
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Research on text clustering algorithm based on improved K-means

Xinwu Li

Year: 2010 Vol: 22 Pages: V4-573
JOURNAL ARTICLE

Improved K-Means Algorithm in Text Semantic Clustering

Ma Junhong

Journal:   The Open Cybernetics & Systemics Journal Year: 2014 Vol: 8 (1)Pages: 530-534
JOURNAL ARTICLE

Research on Improved K-Means Clustering Algorithm

Yin Sheng ZhangHui Lin ShanJia Qiang LiJie Zhou

Journal:   Advanced materials research Year: 2011 Vol: 403-408 Pages: 1977-1980
© 2026 ScienceGate Book Chapters — All rights reserved.