JOURNAL ARTICLE

A novel evolutionary preprocessing method based on over-sampling and under-sampling for imbalanced datasets

Abstract

Imbalanced datasets are commonly encountered in real-world classification problems. However, many machine learning algorithms are originally designed for well-balanced datasets. Re-sampling has become an important step to preprocess imbalanced dataset. It aims at balancing the datasets by increasing the sample size of the smaller class or decreasing the sample size of the larger class, which are known as over-sampling and under-sampling respectively. In this paper, a novel sampling strategy based on both over-sampling and under-sampling is proposed, in which the new samples of the smaller class are created by the Synthetic Minority Over-sampling Technique (SMOTE). The improvement of the datasets is done by the evolutionary computational method of CHC that works on both the minority class and majority class samples. The result is a hybrid data preprocessing method that combines both over-sampling and under-sampling techniques to re-sample datasets. The evaluation is done by applying the learning algorithm C4.5 to obtain a classification model from the re-sampled datasets. Experimental results reported that the proposed approach can decrease the over-sampling rate about 50% with only around 3% discrepancy on the accuracy.

Keywords:
Preprocessor Sampling (signal processing) Computer science Oversampling Artificial intelligence Class (philosophy) Sample (material) Data mining Data pre-processing Machine learning Pattern recognition (psychology) Sample size determination Statistics Mathematics Bandwidth (computing)

Metrics

28
Cited By
0.47
FWCI (Field Weighted Citation Impact)
26
Refs
0.78
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Imbalanced Data Classification Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Machine Learning and Data Classification
Physical Sciences →  Computer Science →  Artificial Intelligence
Electricity Theft Detection Techniques
Physical Sciences →  Engineering →  Electrical and Electronic Engineering

Related Documents

BOOK-CHAPTER

Margin-Based Over-Sampling Method for Learning from Imbalanced Datasets

Xiannian FanKe TangThomas Weise

Lecture notes in computer science Year: 2011 Pages: 309-320
JOURNAL ARTICLE

Cluster-Based Minority Over-Sampling for Imbalanced Datasets

Kamthorn PuntumaponThanawin RAKTHAMAMONKitsana Waiyamai

Journal:   IEICE Transactions on Information and Systems Year: 2016 Vol: E99.D (12)Pages: 3101-3109
JOURNAL ARTICLE

A Novel Neighborhood‐Weighted Sampling Method for Imbalanced Datasets

Mingjian GuangChungang YanGuanjun LIUJunli WangChangjun Jiang

Journal:   Chinese Journal of Electronics Year: 2022 Vol: 31 (5)Pages: 969-979
JOURNAL ARTICLE

A hybrid evolutionary preprocessing method for imbalanced datasets

Ginny Y. WongF.H.F. LeungSai Ho Ling

Journal:   Information Sciences Year: 2018 Vol: 454-455 Pages: 161-177
© 2026 ScienceGate Book Chapters — All rights reserved.