JOURNAL ARTICLE

Feature selection with conditional mutual information maximin in text categorization

Abstract

Feature selection is an important component of text categorization. This technique can both increase a classifier's computation speed, and reduce the overfitting problem. Several feature selection methods, such as information gain and mutual information, have been widely used. Although they greatly improve the classifier's performance, they have a common drawback, which is that they do not consider the mutual relationships among the features. In this situation, where one feature's predictive power is weakened by others, and where the selected features tend to bias towards major categories, such selection methods are not very effective. In this paper, we propose a novel feature selection method for text categorization called conditional mutual information maximin (CMIM). It can select a set of individually discriminating and weakly dependent features. The experimental results show that CMIM can perform much better than traditional feature selection methods.

Keywords:
Mutual information Feature selection Overfitting Computer science Artificial intelligence Categorization Text categorization Classifier (UML) Pattern recognition (psychology) Machine learning Information gain Minimax Selection (genetic algorithm) Feature (linguistics) Conditional mutual information Data mining Mathematics

Metrics

102
Cited By
5.79
FWCI (Field Weighted Citation Impact)
25
Refs
0.96
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Computational Techniques and Applications
Physical Sciences →  Computer Science →  Artificial Intelligence
Image Retrieval and Classification Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

BOOK-CHAPTER

Weighted Average Pointwise Mutual Information for Feature Selection in Text Categorization

Karl-Michael Schneider

Lecture notes in computer science Year: 2005 Pages: 252-263
JOURNAL ARTICLE

Multi-Label Feature Selection with Conditional Mutual Information

Xiujuan WangYuchen Zhou

Journal:   Computational Intelligence and Neuroscience Year: 2022 Vol: 2022 Pages: 1-13
JOURNAL ARTICLE

Fast Binary Feature Selection with Conditional Mutual Information

FleuretFrançois

Journal:   Journal of Machine Learning Research Year: 2004
© 2026 ScienceGate Book Chapters — All rights reserved.