JOURNAL ARTICLE

MR-Mafia: Parallel Subspace Clustering Algorithm Based on MapReduce for Large Multi-dimensional Datasets

Abstract

The mission of subspace clustering is to find hidden clusters exist in different subspaces within a dataset. In recent years, with the exponential growth of data size and data dimensions, traditional subspace clustering algorithms become inefficient as well as ineffective while extracting knowledge in the big data environment, resulting in an emergent need to design efficient parallel distributed subspace clustering algorithms to handle large multi-dimensional data with an acceptable computational cost. In this paper, we introduce MR-Mafia: a parallel mafia subspace clustering algorithm based on MapReduce. The algorithm takes advantage of MapReduce's data partitioning and task parallelism and achieves a good tradeoff between the cost for disk accesses and communication cost. The experimental results show near linear speedups and demonstrate the high scalability and great application prospects of the proposed algorithm.

Keywords:
Computer science Cluster analysis Scalability Linear subspace Subspace topology Big data Clustering high-dimensional data Data mining Data stream clustering Parallel algorithm Task (project management) Algorithm CURE data clustering algorithm Parallel computing Correlation clustering Artificial intelligence Database Mathematics

Metrics

3
Cited By
0.40
FWCI (Field Weighted Citation Impact)
23
Refs
0.67
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Clustering Algorithms Research
Physical Sciences →  Computer Science →  Artificial Intelligence
Complex Network Analysis Techniques
Physical Sciences →  Physics and Astronomy →  Statistical and Nonlinear Physics
Face and Expression Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.