JOURNAL ARTICLE

A new feature selection method for handling redundant information in text classification

Youwei WangLizhou Feng

Year: 2018 Journal:   Frontiers of Information Technology & Electronic Engineering Vol: 19 (2)Pages: 221-234   Publisher: Springer Science+Business Media

Abstract

Feature selection is an important approach to dimensionality reduction in the field of text classification. Because of the difficulty in handling the problem that the selected features always contain redundant information, we propose a new simple feature selection method, which can effectively filter the redundant features. First, to calculate the relationship between two words, the definitions of word frequency based relevance and correlative redundancy are introduced. Furthermore, an optimal feature selection (OFS) method is chosen to obtain a feature subset FS1. Finally, to improve the execution speed, the redundant features in FS1 are filtered by combining a predetermined threshold, and the filtered features are memorized in the linked lists. Experiments are carried out on three datasets (WebKB, 20-Newsgroups, and Reuters-21578) where in support vector machines and naïve Bayes are used. The results show that the classification accuracy of the proposed method is generally higher than that of typical traditional methods (information gain, improved Gini index, and improved comprehensively measured feature selection) and the OFS methods. Moreover, the proposed method runs faster than typical mutual information-based methods (improved and normalized mutual information-based feature selections, and multilabel feature selection based on maximum dependency and minimum redundancy) while simultaneously ensuring classification accuracy. Statistical results validate the effectiveness of the proposed method in handling redundant information in text classification.

Keywords:
Feature selection Mutual information Redundancy (engineering) Computer science Pattern recognition (psychology) Artificial intelligence Feature (linguistics) Dimensionality reduction Data mining Minimum redundancy feature selection Curse of dimensionality Information gain ratio Support vector machine Word (group theory) Naive Bayes classifier Filter (signal processing) Mathematics

Metrics

10
Cited By
1.39
FWCI (Field Weighted Citation Impact)
43
Refs
0.83
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Web Data Mining and Analysis
Physical Sciences →  Computer Science →  Information Systems
Spam and Phishing Detection
Physical Sciences →  Computer Science →  Information Systems

Related Documents

JOURNAL ARTICLE

Redundant Feature Selection Methods in Text Classification

Su Fen Chen

Journal:   Advanced materials research Year: 2014 Vol: 1044-1045 Pages: 1258-1261
JOURNAL ARTICLE

Information-theoretic feature selection algorithms for text classification

J. NovovicovaiAdnan Malik

Journal:   Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005. Year: 2006 Vol: 5 Pages: 3272-3277
JOURNAL ARTICLE

Feature Selection for Text Classification Using Mutual Information

İlhami SELAli KarcıDavut Hanbay

Journal:   2019 International Artificial Intelligence and Data Processing Symposium (IDAP) Year: 2019 Pages: 1-4
© 2026 ScienceGate Book Chapters — All rights reserved.