K-means clustering algorithm is one of the most widely used clustering algorithms and has been applied in many fields of science and technology. A major problem of the original k-means clustering algorithm is that the cluster results depend on the initial centroids which choose at random. At the same time, the similarity measure on the algorithm based on distance is not suitable for big high- dimensional dataset. They all lead to severe degradation in performance. In this paper, an improved k-means clustering algorithm based on dissimilarity is proposed. It selects the initial centriods using the Huffman tree which uses dissimilarity matrix to construct. Many experiments confirm that the proposed algorithm is an efficient algorithm with better clustering accuracy on the same algorithm time complexity.
Zhe ZhangJunxi ZhangHuifeng Xue
Daehyon KimS KimJ RussellK KooM ChaeG LeeJ KimJ ParkM ChoS KimS MimC ShangF YangD HuangW LyuA MohamedG DahlG HintonY DingS ChenJ XuG HintonR SalakhutdinovY BengioG HintonD KimD Kim
Guanli YueYanpeng QuAnsheng Deng