JOURNAL ARTICLE

An Improved Information Gain Feature Selection Algorithm for SVM Text Classifier

Abstract

Feature selection algorithm has a great influence on the accuracy of text categorization. The traditional information gain (IG) feature selection algorithm usually selects the features that rarely appear in the specified categories, but frequently appear in other categories. To overcome this drawback, on the basis of in-depth analysis of the related algorithms, an improved IG feature selection method is proposed. At first, the features are selected by the categories of data set, and the features from different categories are merged by an optimized method. Then, the weight of IG is calculated by using the probability of the appearance of these characteristics. At last, between-class concentration distribution factor and within-class word frequency dispersion distribution factor are adopted. SVM classifier is used to verify the algorithm. It is proved that our improved method has better performance than the original IG and other two improved methods.

Keywords:
Feature selection Computer science Artificial intelligence Pattern recognition (psychology) Support vector machine Information gain Text categorization Classifier (UML) Statistical classification Categorization Feature (linguistics) Algorithm Data mining Machine learning

Metrics

25
Cited By
2.83
FWCI (Field Weighted Citation Impact)
15
Refs
0.95
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Web Data Mining and Analysis
Physical Sciences →  Computer Science →  Information Systems
Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.