A Hybrid Surrogate Model for Evolutionary Undersampling in Imbalanced Classification

Hoang Lam Le; Dario Landa-Silva; Mikel Galar; Salvador García; Isaac Triguero

doi:10.1109/cec48606.2020.9185774

ScienceGate Book Chapters

JOURNAL ARTICLE

A Hybrid Surrogate Model for Evolutionary Undersampling in Imbalanced Classification

Hoang Lam Le Dario Landa-Silva Mikel Galar Salvador García Isaac Triguero

Year: 2020 Pages: 1-8

DOI: 10.1109/cec48606.2020.9185774

Get Full-Text PDF Get Analytical Report

Abstract

© 2020 IEEE. Data preprocessing is a key stage in data mining that allows machine learning algorithms to obtain meaningful insights. Many preprocessing problems such as feature selection or instance selection can be modelled as optimisation/search problems. Evolutionary algorithms have traditionally excelled in this task when dealing with data of a moderate size. However, their application to large datasets typically involves very high computational costs. In this work, we propose a hybrid surrogate model for evolutionary undersampling in imbalanced classification problems. These are characterised by having a highly skewed distribution of classes in which evolutionary algorithms aim to balance the training data by selecting only the most relevant data. The proposed technique combines a two-stage clustering-based surrogate method with a windowing approach to quickly approximate fitness values of the chromosomes and accelerate the search. The experiments carried out in 44 standard imbalanced datasets show that the proposed hybrid surrogate model highly reduces the computational cost of the evolutionary algorithm without a considerable loss of performance.

Keywords:

Undersampling Computer science Evolutionary algorithm Artificial intelligence Machine learning Feature selection Cluster analysis Evolutionary computation Preprocessor Data pre-processing Data mining

Metrics

Cited By

0.88

FWCI (Field Weighted Citation Impact)

Refs

0.79

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Imbalanced Data Classification Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Machine Learning and Data Classification

Physical Sciences → Computer Science → Artificial Intelligence

Evolutionary Algorithms and Applications

Physical Sciences → Computer Science → Artificial Intelligence

A Hybrid Surrogate Model for Evolutionary Undersampling in Imbalanced Classification

Abstract

Metrics

Citation History

Topics

Related Documents

EUSC: A clustering-based surrogate model to accelerate evolutionary undersampling in imbalanced classification

Evolutionary undersampling for imbalanced big data classification

Multi-objective Evolutionary Undersampling Algorithm for Imbalanced Data Classification

Evolutionary Undersampling for Classification with Imbalanced Datasets: Proposals and Taxonomy

Evolutionary undersampling boosting for imbalanced classification of breast cancer malignancy