This paper presents a CPCluster Map Reduce algorithm to achieve parallelism in cloud computing platform for clustering large, high-dimensional datasets. The proposed Map Reduce paradigm based clustering algorithm improves the traditional cluster algorithm in a parallelized way. It is scalability and has a good acceleration capability, and by adding the compute nodes, speedup is achieved. Experimental results show that the CPCluster Map Reduce algorithm works much better than traditional cluster algorithm, especially when the number of samples in the data sets increases.
Ran JinChunhai KouRuijuan LiuYefeng Li
Terence KwokKate Smith‐MilesSebastián LozanoDavid Taniar
Kilian StoffelAbdelkader Belkoniene
Minchao WangWu ZhangDing WangDongbo DaiHuiran ZhangHao XieLuonan ChenYike GuoJiang Xie