Feature selection is an effective pre-processing technology to facilitate text mining on high dimensional feature space. In recent years, many effective redundant feature selection methods have been proposed from different motivations. However, a comparative experimental study on redundant feature selection methods in the field of text mining has not been reported yet. In order to solve this problem, an extensive empirical comparative study with the task of text classification is given in the paper. The experimental results indicate that the 3-way Mutual Information represents the redundancy much better than traditional 2-way Mutual Information, since the label information are considered by 3-way Mutual Information. As a result, the performances of redundant feature selection methods based on 3-way Mutual Information overwhelm other methods.
Anirban DasguptaPetros DrineasBoulos HarbVanja JosifovskiMichael W. Mahoney
Muhammad Kashif IqbalMalik Muneeb AbidMuhammad Noman KhalidAmir Manzoor
Wenkai LiuJiongen XiaoMing Hong
Oluwaseun Peter IgeKeng Hoon Gan