JOURNAL ARTICLE

Effective Text Classification by a Supervised Feature Selection Approach

Abstract

The high dimensionality of data is a great challenge for effective text classification. Each document in a document corpus contains many irrelevant and noisy information which eventually reduces the efficiency of text classification. Automatic feature selection methods are extremely important to handle the high dimensionality of data for effective text classification. Feature selection in text classification focuses on identifying relevant information without affecting the accuracy of the classifier. Several feature selection methods have been proposed to improve the classification accuracy by reducing the original feature space. To improve the performance of text classification a new supervised feature selection approach has been proposed which develops a similarity between a term and a class. In this way every term will generate a score based on their similarity with all the classes and then all the terms will be ranked accordingly. The experimental results are presented on several TREC and Reuter data sets using knn classifier. The performances of the classifiers are compared using precision, recall, f-measure and classification accuracy. The proposed term selection approach is compared with document frequency thresholding, information gain, mutual information and chi square statistic. The empirical studies have shown that the proposed method performs significantly better than the other methods.

Keywords:
Computer science Feature selection Artificial intelligence Classifier (UML) Pattern recognition (psychology) Thresholding Curse of dimensionality Document classification Mutual information Statistic Data mining Mathematics Statistics

Metrics

60
Cited By
7.58
FWCI (Field Weighted Citation Impact)
28
Refs
0.98
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Spam and Phishing Detection
Physical Sciences →  Computer Science →  Information Systems
Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Effective feature selection technique for text classification

Hari SeethaM. Narasimha MurtyR. Saravanan

Journal:   International Journal of Data Mining Modelling and Management Year: 2015 Vol: 7 (3)Pages: 165-165
JOURNAL ARTICLE

Supervised Hebb rule based feature selection for text classification

Heyong WangMing Hong

Journal:   Information Processing & Management Year: 2018 Vol: 56 (1)Pages: 167-191
JOURNAL ARTICLE

Feature Selection for Effective Text Classification using Semantic Information

Rajul K. JainNitin Pise

Journal:   International Journal of Computer Applications Year: 2015 Vol: 113 (10)Pages: 18-25
JOURNAL ARTICLE

Joint Semi-Supervised Feature Selection and Classification through Bayesian Approach

Bingbing JiangXingyu WuKui YuHuanhuan Chen

Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Year: 2019 Vol: 33 (01)Pages: 3983-3990
© 2026 ScienceGate Book Chapters — All rights reserved.