JOURNAL ARTICLE

Evaluation of Data Clustering Accuracy using K-Means Algorithm

Suraya SurayaMuhammad SholehUning Lestari

Year: 2023 Journal:   International Journal of Multidisciplinary Approach Research and Science Vol: 2 (01)Pages: 385-396

Abstract

Data clustering is one of the methods in data science that is often used in data analysis. This method is used in making groupings from a collection of datasheets. Data clustering is done to find patterns or relationships between data. This research aims to evaluate the accuracy of data clustering using K-Means algorithm on wine datasheet. Wine datasheet has 13 features that describe the chemical characteristics of three types of wine. The clustering process must produce the best clustering evaluation metrics. The evaluation metric is done through comparison between the clustering results of K-Means algorithm with Davies Bouldin and Silhouette. The research steps involved data standardization, selection of the optimal number of clusters, and assessment of clustering accuracy. The research method uses KDD which consists of pre-processing, transformation, model building and model evaluation. Experimental results show that appropriate parameters and cluster initialization can improve clustering evaluation metrics. The clustering results show that the normalized datasheet produces evaluation metrics for Davies Bouldin 2 groups and Silhouette produces 3 groups. Before normalization, Davies Boulidin results in 7 groups and Silhouette results in 2 groups. In conclusion, this study produced different evaluation metrics between normalized and non-normalized datasheets. The selection of the number of groups chosen depends on the context of the data analysis performed and is selected into 3 groups which can be labelled "Superior Variety", the second group "Intermediate Variety" and the third group "Standard Variety".

Keywords:
Cluster analysis Silhouette Computer science Data mining Normalization (sociology) Correlation clustering CURE data clustering algorithm Single-linkage clustering Fuzzy clustering Datasheet Artificial intelligence

Metrics

5
Cited By
3.09
FWCI (Field Weighted Citation Impact)
18
Refs
0.91
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Data Mining and Machine Learning Applications
Physical Sciences →  Computer Science →  Information Systems
Advanced Clustering Algorithms Research
Physical Sciences →  Computer Science →  Artificial Intelligence
Customer churn and segmentation
Social Sciences →  Business, Management and Accounting →  Marketing

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.