JOURNAL ARTICLE

A Hybrid Surrogate Model for Evolutionary Undersampling in Imbalanced Classification

Abstract

© 2020 IEEE. Data preprocessing is a key stage in data mining that allows machine learning algorithms to obtain meaningful insights. Many preprocessing problems such as feature selection or instance selection can be modelled as optimisation/search problems. Evolutionary algorithms have traditionally excelled in this task when dealing with data of a moderate size. However, their application to large datasets typically involves very high computational costs. In this work, we propose a hybrid surrogate model for evolutionary undersampling in imbalanced classification problems. These are characterised by having a highly skewed distribution of classes in which evolutionary algorithms aim to balance the training data by selecting only the most relevant data. The proposed technique combines a two-stage clustering-based surrogate method with a windowing approach to quickly approximate fitness values of the chromosomes and accelerate the search. The experiments carried out in 44 standard imbalanced datasets show that the proposed hybrid surrogate model highly reduces the computational cost of the evolutionary algorithm without a considerable loss of performance.

Keywords:
Undersampling Computer science Evolutionary algorithm Artificial intelligence Machine learning Feature selection Cluster analysis Evolutionary computation Preprocessor Data pre-processing Data mining

Metrics

7
Cited By
0.88
FWCI (Field Weighted Citation Impact)
31
Refs
0.79
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Imbalanced Data Classification Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Machine Learning and Data Classification
Physical Sciences →  Computer Science →  Artificial Intelligence
Evolutionary Algorithms and Applications
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.