Xiaobo ShenWeiwei LiuIvor W. TsangFumin ShenQuansen Sun
Large-scale clustering has been widely used in many applications, and has received much attention. Most existing clustering methods suffer from both expensive computation and memory costs when applied to large-scale datasets. In this paper, we propose a novel clustering method, dubbed compressed k-means (CKM), for fast large-scale clustering. Specifically, high-dimensional data are compressed into short binary codes, which are well suited for fast clustering. CKM enjoys two key benefits: 1) storage can be significantly reduced by representing data points as binary codes; 2) distance computation is very efficient using Hamming metric between binary codes. We propose to jointly learn binary codes and clusters within one framework. Extensive experimental results on four large-scale datasets, including two million-scale datasets demonstrate that CKM outperforms the state-of-the-art large-scale clustering methods in terms of both computation and memory cost, while achieving comparable clustering accuracy.
Yuewei MingEn ZhuMao WangQiang LiuXinwang LiuJianping Yin
Mi LiEibe FrankBernhard Pfahringer
Qinghao HuJiaxiang WuLu BaiYifan ZhangJian Cheng
Marco Jacopo FerrarottiSergio DecherchiWalter Rocchia
Yawei ZhaoYuewei MingXinwang LiuEn ZhuKaikai ZhaoJianping Yin