JOURNAL ARTICLE

Cluster-based majority under-sampling approaches for class imbalance learning

Abstract

The class imbalance problem usually occurs in real applications. The class imbalance is that the amount of one class may be much less than that of another in training set. Under-sampling is a very popular approach to deal with this problem. Under-sampling approach is very efficient, it only using a subset of the majority class. The drawback of under-sampling is that it throws away many potentially useful majority class examples. To overcome this drawback, we adopt an unsupervised learning technique for supervised learning. We proposes cluster-based majority under-sampling approaches for selecting a representative subset from the majority class. Compared to under-sampling, cluster-based under-sampling can effectively avoid the important information loss of majority class. We adopt two methods to select representative subset from k clusters with certain proportions, and then use the representative subset and the all minority class samples as training data to improve accuracy over minority and majority classes. In the paper, we compared the behaviors of our approaches with the traditional random under-sampling approach on ten UCI repository datasets using the following classifiers: k-nearest neighbor and Naïve Bayes classifier. Recall, Precision, F-measure, G-mean and BACC (balance accuracy) are used for evaluating performance of classifiers. Experimental results show that our cluster-based majority under-sampling approaches outperform the random under-sampling approach. Our approaches attain better overall performance on k-nearest neighbor classifier compared to Naïve Bayes classifier.

Keywords:
Computer science Artificial intelligence Classifier (UML) Naive Bayes classifier Machine learning Sampling (signal processing) Class (philosophy) Cluster sampling Oversampling Pattern recognition (psychology) Data mining Precision and recall Support vector machine Bandwidth (computing) Population

Metrics

68
Cited By
1.60
FWCI (Field Weighted Citation Impact)
27
Refs
0.87
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Imbalanced Data Classification Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Electricity Theft Detection Techniques
Physical Sciences →  Engineering →  Electrical and Electronic Engineering

Related Documents

JOURNAL ARTICLE

A majority affiliation based under-sampling method for class imbalance problem

Ying XieX. HuangFeng QinFagen LiXuyang Ding

Journal:   Information Sciences Year: 2024 Vol: 662 Pages: 120263-120263
BOOK-CHAPTER

Controlled Under-Sampling with Majority Voting Ensemble Learning for Class Imbalance Problem

Riyaz SikoraSahil Raina

Advances in intelligent systems and computing Year: 2018 Pages: 33-39
JOURNAL ARTICLE

Exploratory Under-Sampling for Class-Imbalance Learning

Xuying LiuJianxin WuZhi-hua Zhou

Journal:   Proceedings Year: 2006 Pages: 965-969
JOURNAL ARTICLE

DATA IMBALANCE IN LANDSLIDE SUSCEPTIBILITY ZONATION: UNDER-SAMPLING FOR CLASS-IMBALANCE LEARNING

Sharad Kumar GuptaMuskan JhunjhunwallaAmit BhardwajDericks Praise Shukla

Journal:   ˜The œinternational archives of the photogrammetry, remote sensing and spatial information sciences/International archives of the photogrammetry, remote sensing and spatial information sciences Year: 2020 Vol: XLII-3/W11 Pages: 51-57
JOURNAL ARTICLE

Class Imbalance Problem: A Wrapper-Based Approach using Under-Sampling with Ensemble Learning

Riyaz SikoraYoon Sang Lee

Journal:   Information Systems Frontiers Year: 2024 Vol: 27 (4)Pages: 1491-1506
© 2026 ScienceGate Book Chapters — All rights reserved.