JOURNAL ARTICLE

An Improved Mutual Information-Based Feature Selection Algorithm for Text Classification

Abstract

Feature selection plays an important role in text classification, and contributes directly to the accuracy of the classification. In order to correct the defects, such as mutual information-Based feature selection method tends to select rare words and those words from small samples as features, and negative MI value. This paper proposes a new improved feature evaluation function for automatic text classification by taking word frequency, concentration rate between classes and dispersion within class into overall consideration. According to experimental results, the improved algorithm is well placed to remedy the defect that the original MI evaluation function is prone to select rare words, and can improve the performance of classification significantly.

Keywords:
Feature selection Mutual information Computer science Artificial intelligence Pattern recognition (psychology) Feature (linguistics) Class (philosophy) Selection (genetic algorithm) Information gain Statistical classification Function (biology) Word (group theory) Data mining Algorithm Mathematics

Metrics

4
Cited By
0.94
FWCI (Field Weighted Citation Impact)
10
Refs
0.83
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Web Data Mining and Analysis
Physical Sciences →  Computer Science →  Information Systems
Rough Sets and Fuzzy Logic
Physical Sciences →  Computer Science →  Computational Theory and Mathematics

Related Documents

BOOK-CHAPTER

Modified Pointwise Mutual Information-Based Feature Selection for Text Classification

Tsvetanka Georgieva‐Trifonova

Lecture notes in networks and systems Year: 2021 Pages: 333-353
JOURNAL ARTICLE

Feature Selection for Text Classification Using Mutual Information

İlhami SELAli KarcıDavut Hanbay

Journal:   2019 International Artificial Intelligence and Data Processing Symposium (IDAP) Year: 2019 Pages: 1-4
© 2026 ScienceGate Book Chapters — All rights reserved.