Compressed K-Means for Large-Scale Clustering

Xiaobo Shen; Weiwei Liu; Ivor W. Tsang; Fumin Shen; Quansen Sun

doi:10.1609/aaai.v31i1.10852

ScienceGate Book Chapters

JOURNAL ARTICLE

Compressed K-Means for Large-Scale Clustering

Xiaobo Shen Weiwei Liu Ivor W. Tsang Fumin Shen Quansen Sun

Year: 2017 Journal: Proceedings of the AAAI Conference on Artificial Intelligence Vol: 31 (1) Publisher: Association for the Advancement of Artificial Intelligence

DOI: 10.1609/aaai.v31i1.10852

Get Full-Text PDF Get Analytical Report

Abstract

Large-scale clustering has been widely used in many applications, and has received much attention. Most existing clustering methods suffer from both expensive computation and memory costs when applied to large-scale datasets. In this paper, we propose a novel clustering method, dubbed compressed k-means (CKM), for fast large-scale clustering. Specifically, high-dimensional data are compressed into short binary codes, which are well suited for fast clustering. CKM enjoys two key benefits: 1) storage can be significantly reduced by representing data points as binary codes; 2) distance computation is very efficient using Hamming metric between binary codes. We propose to jointly learn binary codes and clusters within one framework. Extensive experimental results on four large-scale datasets, including two million-scale datasets demonstrate that CKM outperforms the state-of-the-art large-scale clustering methods in terms of both computation and memory cost, while achieving comparable clustering accuracy.

Keywords:

Cluster analysis Computer science Computation Hamming distance Scale (ratio) Binary number CURE data clustering algorithm Data mining Clustering high-dimensional data Correlation clustering Metric (unit) Algorithm Artificial intelligence Mathematics

Metrics

Cited By

4.57

FWCI (Field Weighted Citation Impact)

Refs

0.95

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Clustering Algorithms Research

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Face and Expression Recognition

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Compressed K-Means for Large-Scale Clustering

Abstract

Metrics

Citation History

Topics

Related Documents

Scalable k-means for large-scale clustering

Large scale K-means clustering using GPUs

Fast K-means for Large Scale Clustering

Distributed Kernel K-Means for Large Scale Clustering

Large-scale k-means clustering via variance reduction