JOURNAL ARTICLE

Weighted Gini index feature selection method for imbalanced data

Abstract

An imbalanced class problem occurs within abundant real-world applications, e.g., fraud detection, text classification, and cancer diagnosis. Beside balancing the imbalanced data distribution to deal with imbalanced data problems, another significant way to solve the bias-to-majority problem is via proper feature selection. This work is intended to use a feature selection method that can choose a subset of features and make ROC AUC and F-measure results in order to achieve high performance on a minority class. In this paper, a weighted Gini index(WGI) feature selection method is proposed. In order to evaluate the proposed method, a comparison result among Chi-square, F-statistic and Gini index feature selection is shown, and Xgboost is the classifier that is used to test the performance of the subset of features. Experimental results indicate that F-statistic contains the best performance when a few features are selected. However, when the number of selected features increases, WGI feature selection achieves the best results. A comparison between the average results from ROC AUC and F-measure are also presented. It shows that ROC AUC always contains a good performance, even if only a few features are selected, and only changes slightly as the subset of features expands. However, the performance of F-measure achieves a good performance after 60% of features are chosen. The results are helpful for practitioners to select a proper feature selection method when facing a practical problem.

Keywords:
Feature selection Statistic Classifier (UML) Pattern recognition (psychology) Artificial intelligence Computer science Feature (linguistics) Selection (genetic algorithm) Measure (data warehouse) Data mining Mathematics Statistics Machine learning

Metrics

46
Cited By
4.17
FWCI (Field Weighted Citation Impact)
35
Refs
0.94
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Imbalanced Data Classification Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Face and Expression Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Feature Selection Method Based on Weighted Mutual Information for Imbalanced Data

Kewen LiMingxiao YuLu LiuTiming LiJiannan Zhai

Journal:   International Journal of Software Engineering and Knowledge Engineering Year: 2018 Vol: 28 (08)Pages: 1177-1194
JOURNAL ARTICLE

Class‐index corpus‐index measure: A novel feature selection method for imbalanced text data

Bekir Parlak

Journal:   Concurrency and Computation Practice and Experience Year: 2022 Vol: 34 (21)
JOURNAL ARTICLE

Feature Selection in Imbalanced Data

Firuz KamalovFadi ThabtahHo‐Hon Leung

Journal:   Annals of Data Science Year: 2022 Vol: 10 (6)Pages: 1527-1541
JOURNAL ARTICLE

An embedded feature selection method for imbalanced data classification

Haoyue LiuMengChu ZhouQing Liu

Journal:   IEEE/CAA Journal of Automatica Sinica Year: 2019 Vol: 6 (3)Pages: 703-715
© 2026 ScienceGate Book Chapters — All rights reserved.