Dong HuangChang-Dong WangJian-Sheng WuJian-Huang LaiChee-Keong Kwoh
This paper focuses on scalability and robustness of spectral clustering for\nextremely large-scale datasets with limited resources. Two novel algorithms are\nproposed, namely, ultra-scalable spectral clustering (U-SPEC) and\nultra-scalable ensemble clustering (U-SENC). In U-SPEC, a hybrid representative\nselection strategy and a fast approximation method for K-nearest\nrepresentatives are proposed for the construction of a sparse affinity\nsub-matrix. By interpreting the sparse sub-matrix as a bipartite graph, the\ntransfer cut is then utilized to efficiently partition the graph and obtain the\nclustering result. In U-SENC, multiple U-SPEC clusterers are further integrated\ninto an ensemble clustering framework to enhance the robustness of U-SPEC while\nmaintaining high efficiency. Based on the ensemble generation via multiple\nU-SEPC's, a new bipartite graph is constructed between objects and base\nclusters and then efficiently partitioned to achieve the consensus clustering\nresult. It is noteworthy that both U-SPEC and U-SENC have nearly linear time\nand space complexity, and are capable of robustly and efficiently partitioning\nten-million-level nonlinearly-separable datasets on a PC with 64GB memory.\nExperiments on various large-scale datasets have demonstrated the scalability\nand robustness of our algorithms. The MATLAB code and experimental data are\navailable at https://www.researchgate.net/publication/330760669.\n
Yinian LiangZhigang RenZongze WuDeyu ZengJianzhong Li
Hongfu LiuTongliang LiuJunjie WuDacheng TaoYun Fu
Jianyuan LiYingjie XiaZhenyu ShanYuncai Liu