Abstract

Clustering is one of the most important unsupervised learning used for prediction and overcome anomalies by grouping of data. As the quantity of the data is increasing every day, it has become a troublesome job to process these data with limited computational resources. The era is in need to treat it as a Big Data problem, which requires an advance technology to store, and process the data in seamlessly distributed fashion. Apache Hadoop offers a solution for this problem by designing techniques using commodity hardware to run parallel jobs. In this paper, we have discussed an algorithm to process K-Means algorithm in Hadoop by varying the data set and cluster centers. We then draw a comparison on parallel and sequential execution, keeping the other factors same. The experimental result depicts that our algorithm can efficiently process large dataset on Hadoop environment.

Keywords:
Computer science Big data Cluster analysis Process (computing) Set (abstract data type) Data set Data mining Map reduce Parallel computing Artificial intelligence Operating system

Metrics

8
Cited By
0.69
FWCI (Field Weighted Citation Impact)
19
Refs
0.77
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Data Stream Mining Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Cloud Computing and Resource Management
Physical Sciences →  Computer Science →  Information Systems
Advanced Clustering Algorithms Research
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Data Categorization Using Hadoop MapReduce-Based Parallel K-Means Clustering

Zahid AnsariAsif AfzalTanvir Habib Sardar

Journal:   Journal of The Institution of Engineers (India) Series B Year: 2019 Vol: 100 (2)Pages: 95-103
JOURNAL ARTICLE

Improved K-Means Clustering Algorithm for Big Data Mining under Hadoop Parallel Framework

Weijia Lu

Journal:   Journal of Grid Computing Year: 2019 Vol: 18 (2)Pages: 239-250
JOURNAL ARTICLE

Parallel K-Means Implementation for Data Clustering Using Hadoop Map-Reduce

C MaithriE Chandramouli

Journal:   Journal of Computational and Theoretical Nanoscience Year: 2018 Vol: 15 (11)Pages: 3297-3302
BOOK-CHAPTER

Data Decomposition for Parallel K-means Clustering

Attila Gürsoy

Lecture notes in computer science Year: 2004 Pages: 241-248
© 2026 ScienceGate Book Chapters — All rights reserved.