JOURNAL ARTICLE

K-means clustering via principal component analysis

Abstract

Principal component analysis (PCA) is a widely used statistical technique for unsupervised dimension reduction. K-means clustering is a commonly used data clustering for performing unsupervised learning tasks. Here we prove that principal components are the continuous solutions to the discrete cluster membership indicators for K-means clustering. New lower bounds for K-means objective function are derived, which is the total variance minus the eigenvalues of the data covariance matrix. These results indicate that unsupervised dimension reduction is closely related to unsupervised learning. Several implications are discussed. On dimension reduction, the result provides new insights to the observed effectiveness of PCA-based data reductions, beyond the conventional noise-reduction explanation that PCA, via singular value decomposition, provides the best low-dimensional linear approximation of the data. On learning, the result suggests effective techniques for K-means data clustering. DNA gene expression and Internet newsgroups are analyzed to illustrate our results. Experiments indicate that the new bounds are within 0.5-1.5% of the optimal values.

Keywords:
Principal component analysis Cluster analysis Dimensionality reduction Unsupervised learning Pattern recognition (psychology) Sparse PCA Singular value decomposition Artificial intelligence Clustering high-dimensional data Covariance matrix Mathematics Dimension (graph theory) Eigenvalues and eigenvectors Computer science Data mining Algorithm Combinatorics Physics

Metrics

1369
Cited By
7.31
FWCI (Field Weighted Citation Impact)
23
Refs
0.98
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Gene expression and cancer classification
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Molecular Biology
Face and Expression Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Clustering Algorithms Research
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Robust k-Means Clustering and Fuzzy Principal Component Analysis

Katsuhiro Honda

Journal:   Journal of Japan Society for Fuzzy Theory and Intelligent Informatics Year: 2013 Vol: 25 (3)Pages: 74-80
BOOK-CHAPTER

Cluster Structure of K-means Clustering via Principal Component Analysis

Chris DingXiaofeng He

Lecture notes in computer science Year: 2004 Pages: 414-418
JOURNAL ARTICLE

Stability analysis in K ‐means clustering

Douglas Steinley

Journal:   British Journal of Mathematical and Statistical Psychology Year: 2007 Vol: 61 (2)Pages: 255-273
JOURNAL ARTICLE

Analysis of glass relics based on principal component analysis and k-means clustering

Kehan Tong

Journal:   Journal of Physics Conference Series Year: 2023 Vol: 2608 (1)Pages: 012017-012017
© 2026 ScienceGate Book Chapters — All rights reserved.