JOURNAL ARTICLE

A Novel Feature Selection Technique for Text Classification Using Naïve Bayes

Subhajit Dey SarkarSaptarsi GoswamiAman AgarwalJaved Aktar

Year: 2014 Journal:   International Scholarly Research Notices Vol: 2014 Pages: 1-10   Publisher: Hindawi Publishing Corporation

Abstract

With the proliferation of unstructured data, text classification or text categorization has found many applications in topic classification, sentiment analysis, authorship identification, spam detection, and so on. There are many classification algorithms available. Naïve Bayes remains one of the oldest and most popular classifiers. On one hand, implementation of naïve Bayes is simple and, on the other hand, this also requires fewer amounts of training data. From the literature review, it is found that naïve Bayes performs poorly compared to other classifiers in text classification. As a result, this makes the naïve Bayes classifier unusable in spite of the simplicity and intuitiveness of the model. In this paper, we propose a two-step feature selection method based on firstly a univariate feature selection and then feature clustering, where we use the univariate feature selection method to reduce the search space and then apply clustering to select relatively independent feature sets. We demonstrate the effectiveness of our method by a thorough evaluation and comparison over 13 datasets. The performance improvement thus achieved makes naïve Bayes comparable or superior to other classifiers. The proposed algorithm is shown to outperform other traditional methods like greedy search based wrapper or CFS.

Keywords:
Naive Bayes classifier Feature selection Computer science Artificial intelligence Univariate Classifier (UML) Bayes' theorem Machine learning Pattern recognition (psychology) Cluster analysis Categorization Feature (linguistics) Data mining Bayes error rate Bayes classifier Bayesian probability Support vector machine Multivariate statistics

Metrics

78
Cited By
2.90
FWCI (Field Weighted Citation Impact)
18
Refs
0.91
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Spam and Phishing Detection
Physical Sciences →  Computer Science →  Information Systems
Web Data Mining and Analysis
Physical Sciences →  Computer Science →  Information Systems

Related Documents

JOURNAL ARTICLE

Feature selection for text classification with Naïve Bayes

Jingnian ChenHoukuan HuangShengfeng TianYouli Qu

Journal:   Expert Systems with Applications Year: 2008 Vol: 36 (3)Pages: 5432-5435
JOURNAL ARTICLE

Feature subset selection using naive Bayes for text classification

Guozhong FengJianhua GuoBing‐Yi JingTieli Sun

Journal:   Pattern Recognition Letters Year: 2015 Vol: 65 Pages: 109-115
BOOK-CHAPTER

Principal Feature Selection Impact for Internet Traffic Classification Using Naïve Bayes

Adi Suryaputra Paramita

Lecture notes in electrical engineering Year: 2016 Pages: 475-480
BOOK-CHAPTER

A Novel Feature Selection Technique for Text Classification

D. S. GuruMostafa Z. AliMahamad Suhil

Advances in intelligent systems and computing Year: 2018 Pages: 721-733
© 2026 ScienceGate Book Chapters — All rights reserved.