JOURNAL ARTICLE

Scalable Fast Evolutionary k-Means Clustering

Abstract

The increasing amount of data requires greater scalability for clustering algorithms. The intrinsic parallelism of the MapReduce model confers management and reliability to large-scale distributed operations. However, its restrictions hinder the direct application of several traditional clustering algorithms. K-means is one of the few clustering algorithms that satisfy the MapReduce constraints, but it requires the prior specification of the number of clusters and is sensitive to their initialization. This paper proposes a MapReduce algorithm able to evolve clusters with no need to specify k-means' parameters. Through evolutive operators, obtained clusters are used to search for better solutions, allowing the algorithm to find quality solutions quickly. The algorithm is compared with state-of-the-art MapReduce versions of a systematic algorithm which is able to find the number of kmeans clusters and initializations. Computational experiments and statistical analyses of the results indicate that the proposed algorithm is able to obtain clusters with quality equal or superior to clusters of the compared algorithm, but faster.

Keywords:
Computer science Cluster analysis Initialization Scalability CURE data clustering algorithm Canopy clustering algorithm Data mining k-means clustering Algorithm Correlation clustering Machine learning Database

Metrics

8
Cited By
1.57
FWCI (Field Weighted Citation Impact)
26
Refs
0.92
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Clustering Algorithms Research
Physical Sciences →  Computer Science →  Artificial Intelligence
Metaheuristic Optimization Algorithms Research
Physical Sciences →  Computer Science →  Artificial Intelligence
Data Stream Mining Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

BOOK-CHAPTER

Evolutionary Rough K-Means Clustering

Pawan Lingras

Lecture notes in computer science Year: 2009 Pages: 68-75
JOURNAL ARTICLE

Fast K-Means Algorithm Clustering

Raied SalmanVojislav KecmanQi LiRobert StrackErik Test

Journal:   International journal of Computer Networks & Communications Year: 2011 Vol: 3 (4)Pages: 17-31
JOURNAL ARTICLE

Clustering stability-based Evolutionary K-Means

Zhenfeng HeChunyan Yu

Journal:   Soft Computing Year: 2018 Vol: 23 (1)Pages: 305-321
JOURNAL ARTICLE

Scalable k-means for large-scale clustering

Yuewei MingEn ZhuMao WangQiang LiuXinwang LiuJianping Yin

Journal:   Intelligent Data Analysis Year: 2019 Vol: 23 (4)Pages: 825-838
© 2026 ScienceGate Book Chapters — All rights reserved.