Text clustering is one of the difficult and hot research fields in the internet search engine research. Using the advantages of K-means clustering and overcoming its disadvantages, a new text clustering algorithm is presented. Firstly, texts are preprocessed to satisfy succeed process. Then, the paper analyzes common K-means clustering algorithm and improves the algorithm principle K-means and corrects its cluster seed selection method of to overcome efficiency of low stability of K-means algorithm which is very sensitive to the initial cluster center and the isolated point text. The experimental results indicate that the improved algorithm has a higher accuracy and has a better stability, compared with the original algorithm.
Shen-yi QIANHuihui LiuDai-yi LI
Yin Sheng ZhangHui Lin ShanJia Qiang LiJie Zhou
Yufang LiuShibin XiaoXueqiang LvShuicai Shi