JOURNAL ARTICLE

Improved mutual information method for text feature selection

Abstract

Reducing the dimensions of high-dimensional feature set is one of the difficulties of text categorization. Feature selection has been effectively applied in text classification, because of its low complexity of computing. Research works show that mutual information is a good feature selection method but doesn't consider the term frequency in each category of the corpus and the connections between terms. To remedying the defects of traditional mutual information method, this article improved measure of mutual information by introducing the feature frequency in class and the dispersion of feature in class, and built a experimental platform by constructing a Chinese text classification system, and did a multi-set of experiments base on this system. The results show that the new feature selection approach has a more excellent effect in text categorization.

Keywords:
Mutual information Feature selection Computer science Feature (linguistics) Categorization Text categorization Class (philosophy) Artificial intelligence Set (abstract data type) Selection (genetic algorithm) Measure (data warehouse) Pattern recognition (psychology) Pointwise mutual information Data mining Interaction information Natural language processing Mathematics Linguistics Statistics

Metrics

8
Cited By
1.89
FWCI (Field Weighted Citation Impact)
6
Refs
0.89
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Computational Techniques and Applications
Physical Sciences →  Computer Science →  Artificial Intelligence
Web Data Mining and Analysis
Physical Sciences →  Computer Science →  Information Systems

Related Documents

BOOK-CHAPTER

Discriminant Mutual Information for Text Feature Selection

Jiaqi WangLi Zhang

Lecture notes in computer science Year: 2021 Pages: 136-151
JOURNAL ARTICLE

Feature Selection for Text Classification Using Mutual Information

İlhami SELAli KarcıDavut Hanbay

Journal:   2019 International Artificial Intelligence and Data Processing Symposium (IDAP) Year: 2019 Pages: 1-4
© 2026 ScienceGate Book Chapters — All rights reserved.