JOURNAL ARTICLE

Research on Text Feature Selection Algorithm Based on Information Gain and Feature Relation Tree

Abstract

The classification performance of previous IG algorithm may decline obviously because of the maldistribution of classes and features, due to which an improved text feature selection method UDsIG is proposed. First, we select features by classes to reduce the impact on feature selection when the classes are unevenly distributed. After that, we use feature equilibrium of distribution to decrease the interference with feature selection when features are unevenly distributed. And then we deal with class features by feature relation tree model, thus to retain strong correlation features. Finally, we use the improved information gain formula, which is based on weighed dispersion, to get the optimal feature subset. The experimental results show the proposed method has better classification performance.

Keywords:
Information gain Feature selection Feature (linguistics) Computer science Relation (database) Pattern recognition (psychology) Artificial intelligence Class (philosophy) Tree (set theory) Selection (genetic algorithm) Data mining Algorithm Mathematics

Metrics

13
Cited By
1.89
FWCI (Field Weighted Citation Impact)
13
Refs
0.90
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Web Data Mining and Analysis
Physical Sciences →  Computer Science →  Information Systems
Advanced Computational Techniques and Applications
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.