MR-Mafia: Parallel Subspace Clustering Algorithm Based on MapReduce for Large Multi-dimensional Datasets

Zhipeng Gao; Yidan Fan; Kun Niu; Zhenyi Ying

doi:10.1109/bigcomp.2018.00045

ScienceGate Book Chapters

JOURNAL ARTICLE

MR-Mafia: Parallel Subspace Clustering Algorithm Based on MapReduce for Large Multi-dimensional Datasets

Zhipeng Gao Yidan Fan Kun Niu Zhenyi Ying

Year: 2018 Pages: 257-262

DOI: 10.1109/bigcomp.2018.00045

Get Full-Text PDF Get Analytical Report

Abstract

The mission of subspace clustering is to find hidden clusters exist in different subspaces within a dataset. In recent years, with the exponential growth of data size and data dimensions, traditional subspace clustering algorithms become inefficient as well as ineffective while extracting knowledge in the big data environment, resulting in an emergent need to design efficient parallel distributed subspace clustering algorithms to handle large multi-dimensional data with an acceptable computational cost. In this paper, we introduce MR-Mafia: a parallel mafia subspace clustering algorithm based on MapReduce. The algorithm takes advantage of MapReduce's data partitioning and task parallelism and achieves a good tradeoff between the cost for disk accesses and communication cost. The experimental results show near linear speedups and demonstrate the high scalability and great application prospects of the proposed algorithm.

Keywords:

Computer science Cluster analysis Scalability Linear subspace Subspace topology Big data Clustering high-dimensional data Data mining Data stream clustering Parallel algorithm Task (project management) Algorithm CURE data clustering algorithm Parallel computing Correlation clustering Artificial intelligence Database Mathematics

Metrics

Cited By

0.40

FWCI (Field Weighted Citation Impact)

Refs

0.67

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Clustering Algorithms Research

Physical Sciences → Computer Science → Artificial Intelligence

Complex Network Analysis Techniques

Physical Sciences → Physics and Astronomy → Statistical and Nonlinear Physics

Face and Expression Recognition

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

MR-Mafia: Parallel Subspace Clustering Algorithm Based on MapReduce for Large Multi-dimensional Datasets

Abstract

Metrics

Citation History

Topics

Related Documents

Clustering very large multi-dimensional datasets with MapReduce

MR-DBSCAN: An Efficient Parallel Density-Based Clustering Algorithm Using MapReduce

Parallel Bat Algorithm-Based Clustering Using MapReduce

DBSC: A Dependency-Based Subspace Clustering Algorithm for High Dimensional Numerical Datasets

A parallel approximate SS-ELM algorithm based on MapReduce for large-scale datasets