HdK-means: Hadoop based parallel K-means clustering for big data

Soumyendu Sekhar Bandyopadhyay; Anup Kumar Halder; Piyali Chatterjee; Mita Nasipuri; Subhadip Basu

doi:10.1109/calcon.2017.8280774

ScienceGate Book Chapters

JOURNAL ARTICLE

HdK-means: Hadoop based parallel K-means clustering for big data

Soumyendu Sekhar Bandyopadhyay Anup Kumar Halder Piyali Chatterjee Mita Nasipuri Subhadip Basu

Year: 2017 Pages: 452-456

DOI: 10.1109/calcon.2017.8280774

Get Full-Text PDF Get Analytical Report

Abstract

Clustering is one of the most important unsupervised learning used for prediction and overcome anomalies by grouping of data. As the quantity of the data is increasing every day, it has become a troublesome job to process these data with limited computational resources. The era is in need to treat it as a Big Data problem, which requires an advance technology to store, and process the data in seamlessly distributed fashion. Apache Hadoop offers a solution for this problem by designing techniques using commodity hardware to run parallel jobs. In this paper, we have discussed an algorithm to process K-Means algorithm in Hadoop by varying the data set and cluster centers. We then draw a comparison on parallel and sequential execution, keeping the other factors same. The experimental result depicts that our algorithm can efficiently process large dataset on Hadoop environment.

Keywords:

Computer science Big data Cluster analysis Process (computing) Set (abstract data type) Data set Data mining Map reduce Parallel computing Artificial intelligence Operating system

Metrics

Cited By

0.69

FWCI (Field Weighted Citation Impact)

Refs

0.77

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Data Stream Mining Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Cloud Computing and Resource Management

Physical Sciences → Computer Science → Information Systems

Advanced Clustering Algorithms Research

Physical Sciences → Computer Science → Artificial Intelligence

HdK-means: Hadoop based parallel K-means clustering for big data

Abstract

Metrics

Citation History

Topics

Related Documents

Data Categorization Using Hadoop MapReduce-Based Parallel K-Means Clustering

Improved K-Means Clustering Algorithm for Big Data Mining under Hadoop Parallel Framework

Parallel K-Means Implementation for Data Clustering Using Hadoop Map-Reduce

Big data clustering method based on parallel K-means

Data Decomposition for Parallel K-means Clustering