JOURNAL ARTICLE

Exploring of clustering algorithm on class-imbalanced data

Abstract

Imbalanced data distribution still remains an unsolved problem in data mining and machine learning. This paper introduces the problem of the class-imbalanced data in classification learning and naturally introduces it into the clustering learning since data clustering is an important and frequently used unsupervised learning method. In this paper, two verification methods based on two different aspects of original data are proposed to test and verify the influence of class-imbalanced data on clustering. Furthermore, we also conduct some experiments on different imbalanced-ratios to exploring its importance in clustering algorithm since is a very important factor for the performance in classification learning. Experimental results indicate that the class-imbalance of the dataset can seriously influence the final performance and efficiency of the clustering algorithm, and the higher the ratio, the higher the adverse effects of the clustering performance based on class-imbalanced data.

Keywords:
Cluster analysis Computer science Artificial intelligence Machine learning Class (philosophy) CURE data clustering algorithm Correlation clustering Data mining Canopy clustering algorithm

Metrics

25
Cited By
3.77
FWCI (Field Weighted Citation Impact)
27
Refs
0.94
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Imbalanced Data Classification Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Data Mining Algorithms and Applications
Physical Sciences →  Computer Science →  Information Systems
Rough Sets and Fuzzy Logic
Physical Sciences →  Computer Science →  Computational Theory and Mathematics

Related Documents

JOURNAL ARTICLE

Clustering-based undersampling in class-imbalanced data

Wei‐Chao LinChih‐Fong TsaiYa‐Han HuJing-Shang Jhang

Journal:   Information Sciences Year: 2017 Vol: 409-410 Pages: 17-26
JOURNAL ARTICLE

A Hierarchical Algorithm for Clustering class-imbalanced Datasets.

Xiaobo WuQiuming Zhu

Journal:   International Conference on Artificial Intelligence Year: 2002 Vol: 8 (6)Pages: 457-463
JOURNAL ARTICLE

An Incremental Clustering-Based Fault Detection Algorithm for Class-Imbalanced Process Data

Jueun KwakTae‐Hyung LeeChang Ouk Kim

Journal:   IEEE Transactions on Semiconductor Manufacturing Year: 2015 Vol: 28 (3)Pages: 318-328
DISSERTATION

A new genetic algorithm based clustering for binary and imbalanced class data sets

Sabariah Saharan

University:   University of Canterbury Research Repository (University of Canterbury) Year: 2016
© 2026 ScienceGate Book Chapters — All rights reserved.