JOURNAL ARTICLE

Improved information gain-based feature selection for text categorization

Abstract

Feature Selection (FS) is one of the most important issues in Text Categorization (TC). Empirical studies show that Information Gain (IG) is an effective method in FS. However, as traditional IG gives little attention to term frequency and takes into account the situation that the term does not appear, the effect is not ideal. In this paper, we put forward an improved information gain-based feature selection method using term frequency information and balance factor(IGTB) for statistical machine learning-based text categorization. Our feature selection method strives to precisely pick out the key feature items on the text corpus. Experiments on Reuters-21578 and WebKB collections show that our method efficiently enhances the categorization accuracy compared with the conventional information gain and other methods.

Keywords:
Feature selection Categorization Text categorization Computer science Information gain Feature (linguistics) Term (time) Artificial intelligence Selection (genetic algorithm) Machine learning Pattern recognition (psychology) Data mining

Metrics

42
Cited By
2.90
FWCI (Field Weighted Citation Impact)
13
Refs
0.91
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Web Data Mining and Analysis
Physical Sciences →  Computer Science →  Information Systems
Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Information gain and divergence-based feature selection for machine learning-based text categorization

Changki LeeGary Geunbae Lee

Journal:   Information Processing & Management Year: 2005 Vol: 42 (1)Pages: 155-165
JOURNAL ARTICLE

Application of Improved Information Gain Feature Selection Methodto Text Clustering

Song Yan Chen Tao

Journal:   Shuju fenxi yu zhishi faxian Year: 2004 Vol: 20 (12)Pages: 7-9
© 2026 ScienceGate Book Chapters — All rights reserved.