JOURNAL ARTICLE

A new feature selection method based on distributional information for Text Classification

Abstract

Feature Selection (FS) is one of the most important issues in Text Classification (TC). A good feature selection can improve the efficiency and accuracy of a text classifier. Based on the analysis of the feature's distributional information, this paper presents a feature selection method named DIFS. In DIFS a new estimation mechanism is proposed to measure the relevance between feature's distribution characteristics and contribution to categorization. In addition, two kinds of algorithms are designed to implement DIFS. Experiments are carried out on a Chinese corpus and by comparison the proposed approach shows a better performance.

Keywords:
Feature selection Computer science Text categorization Artificial intelligence Feature (linguistics) Classifier (UML) Categorization Pattern recognition (psychology) Information gain Mutual information Selection (genetic algorithm) Relevance (law) Feature extraction Data mining Machine learning

Metrics

3
Cited By
0.40
FWCI (Field Weighted Citation Impact)
11
Refs
0.76
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Computational Techniques and Applications
Physical Sciences →  Computer Science →  Artificial Intelligence
Rough Sets and Fuzzy Logic
Physical Sciences →  Computer Science →  Computational Theory and Mathematics
© 2026 ScienceGate Book Chapters — All rights reserved.