Xiaodong WangRung-Ching ChenFei YanHendry Hendry
K-means clustering is popular for its efficiency and is often chosen for analyzing large-scale data. However, it is hard to deal with high-dimensional data, which often contain lots of redundant features. In addition, in real-world applications, we usually confront with massive data streams, such as transport system and social media, which are often periodically generated in high-dimensional space. Although existing K-means extensions have achieved great success on high-dimensional data by integrating with dimension reduction methods, they are limited to off-line data. To solve these problems, we propose a streaming Kmeans clustering with feature selection. The proposed algorithm divides the traditional clustering procedure into several related multiple clustering tasks and selects the representative features by the group sparsity regularization technique. Besides, within such framework, the shared information among neighbor streams can be properly explored. Experimental results on several benchmark datasets demonstrate the effectiveness of the proposed model.
Christos BoutsidisMalik Magdon-Ismail
Dewi Pramudi IsmiShireen PanchooMurinto Murinto
D. S. GuruN. Vinay KumarMahamad Suhil