An imbalanced class problem occurs within abundant real-world applications, e.g., fraud detection, text classification, and cancer diagnosis. Beside balancing the imbalanced data distribution to deal with imbalanced data problems, another significant way to solve the bias-to-majority problem is via proper feature selection. This work is intended to use a feature selection method that can choose a subset of features and make ROC AUC and F-measure results in order to achieve high performance on a minority class. In this paper, a weighted Gini index(WGI) feature selection method is proposed. In order to evaluate the proposed method, a comparison result among Chi-square, F-statistic and Gini index feature selection is shown, and Xgboost is the classifier that is used to test the performance of the subset of features. Experimental results indicate that F-statistic contains the best performance when a few features are selected. However, when the number of selected features increases, WGI feature selection achieves the best results. A comparison between the average results from ROC AUC and F-measure are also presented. It shows that ROC AUC always contains a good performance, even if only a few features are selected, and only changes slightly as the subset of features expands. However, the performance of F-measure achieves a good performance after 60% of features are chosen. The results are helpful for practitioners to select a proper feature selection method when facing a practical problem.
Kewen LiMingxiao YuLu LiuTiming LiJiannan Zhai
Jiayi WuJingmin XinNanning Zheng
Firuz KamalovFadi ThabtahHo‐Hon Leung
Haoyue LiuMengChu ZhouQing Liu