Feature Selection (FS) is one of the most important issues in Text Classification (TC). A good feature selection can improve the efficiency and accuracy of a text classifier. Based on the analysis of the feature's distributional information, this paper presents a feature selection method named DIFS. In DIFS a new estimation mechanism is proposed to measure the relevance between feature's distribution characteristics and contribution to categorization. In addition, two kinds of algorithms are designed to implement DIFS. Experiments are carried out on a Chinese corpus and by comparison the proposed approach shows a better performance.
Zhili PeiYuxin ZhouLisha LiuLihua WangYinan LuYing Kong
Yanling LiGuanzhong DaiGang Li
Lei ZhuGuijun WangXianchun Zou