JOURNAL ARTICLE

Combating the Small Sample Class Imbalance Problem Using Feature Selection

Mike WasikowskiXuewen Chen

Year: 2009 Journal:   IEEE Transactions on Knowledge and Data Engineering Vol: 22 (10)Pages: 1388-1400   Publisher: IEEE Computer Society

Abstract

The class imbalance problem is encountered in real-world applications of machine learning and results in a classifier's suboptimal performance. Researchers have rigorously studied the resampling, algorithms, and feature selection approaches to this problem. No systematic studies have been conducted to understand how well these methods combat the class imbalance problem and which of these methods best manage the different challenges posed by imbalanced data sets. In particular, feature selection has rarely been studied outside of text classification problems. Additionally, no studies have looked at the additional problem of learning from small samples. This paper presents a first systematic comparison of the three types of methods developed for imbalanced data classification problems and of seven feature selection metrics evaluated on small sample data sets from different applications. We evaluated the performance of these metrics using area under the receiver operating characteristic (AUC) and area under the precision-recall curve (PRC). We compared each metric on the average performance across all problems and on the likelihood of a metric yielding the best performance on a specific problem. We examined the performance of these metrics inside each problem domain. Finally, we evaluated the efficacy of these metrics to see which perform best across algorithms. Our results showed that signal-to-noise correlation coefficient (S2N) and Feature Assessment by Sliding Thresholds (FAST) are great candidates for feature selection in most applications, especially when selecting very small numbers of features.

Keywords:
Feature selection Computer science Artificial intelligence Performance metric Machine learning Feature (linguistics) Classifier (UML) Pattern recognition (psychology) Data mining Metric (unit) Resampling Sample size determination Mathematics Statistics

Metrics

381
Cited By
12.58
FWCI (Field Weighted Citation Impact)
70
Refs
0.99
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Imbalanced Data Classification Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Electricity Theft Detection Techniques
Physical Sciences →  Engineering →  Electrical and Electronic Engineering

Related Documents

BOOK-CHAPTER

Cost-Sensitive Feature Selection for Class Imbalance Problem

Małgorzata BachAleksandra Werner

Advances in intelligent systems and computing Year: 2017 Pages: 182-194
JOURNAL ARTICLE

Feature Selection Method from Multiclass Text with Class Imbalance Problem

Minji SeoGilseung AhnSun Hur

Journal:   Journal of Korean Institute of Industrial Engineers Year: 2019 Vol: 45 (2)Pages: 93-100
JOURNAL ARTICLE

A new probabilistic active sample selection algorithm for class imbalance problem

T. Maruthi PadmajaRaju S. BapiP. Radha Krishna

Journal:   International Journal of Knowledge Engineering and Soft Data Paradigms Year: 2013 Vol: 4 (1)Pages: 85-85
JOURNAL ARTICLE

An Efficient Cost-Sensitive Feature Selection Using Chaos Genetic Algorithm for Class Imbalance Problem

Jing BianXinguang PengYing WangHai Zhang

Journal:   Mathematical Problems in Engineering Year: 2016 Vol: 2016 Pages: 1-9
JOURNAL ARTICLE

Combating class imbalance problem in semi-supervised defect detection

Ying MaGuangchun LuoJiong LiAiguo Chen

Journal:   2011 International Conference on Computational Problem-Solving (ICCP) Year: 2011 Vol: 16 Pages: 619-622
© 2026 ScienceGate Book Chapters — All rights reserved.