Feature selection aims to select a smaller feature subset from the rate data which maintains the characteristics of the original data and has similar or better performance in data mining.traditional information theory often divides the relevance and redundancy of the features into consideration in unsupervised feature selection.This article proposes a supervised feature selection algorithm based on information gain analysis .this algorithm is to analyze the correlation between feature and original data and the redundancy between features and selected features based on the mutual information.The potential information gain of the feature is calculated for the feature sorting .At last, the feature is selected according to the gain penalty factor .The experimental results of multiple classifiers on multiple standard datasets show that the proposed algorithm achieves or better than the classification accuracy of the original data on the basis of effectively reducing the data dimension.
Baohang ZhangZiqian WangHaotian LiZhenyu LeiJiujun ChengShangce Gao
Yihui LuoShuchu XiongSichun Wang
Szidónia LefkovitsLászló Lefkovits