JOURNAL ARTICLE

Improved Feature-Selection Method Considering the Imbalance Problem in Text Categorization

Jieming YangZhaoyang QuZhiying Liu

Year: 2014 Journal:   The Scientific World JOURNAL Vol: 2014 Pages: 1-17   Publisher: Hindawi Publishing Corporation

Abstract

The filtering feature-selection algorithm is a kind of important approach to dimensionality reduction in the field of the text categorization. Most of filtering feature-selection algorithms evaluate the significance of a feature for category based on balanced dataset and do not consider the imbalance factor of dataset. In this paper, a new scheme was proposed, which can weaken the adverse effect caused by the imbalance factor in the corpus. We evaluated the improved versions of nine well-known feature-selection methods (Information Gain, Chi statistic, Document Frequency, Orthogonal Centroid Feature Selection, DIA association factor, Comprehensive Measurement Feature Selection, Deviation from Poisson Feature Selection, improved Gini index, and Mutual Information) using naïve Bayes and support vector machines on three benchmark document collections (20-Newsgroups, Reuters-21578, and WebKB). The experimental results show that the improved scheme can significantly enhance the performance of the feature-selection methods.

Keywords:
Feature selection Computer science Dimensionality reduction Pattern recognition (psychology) Artificial intelligence Feature (linguistics) Mutual information Centroid Benchmark (surveying) Data mining Feature vector Naive Bayes classifier Selection (genetic algorithm) Support vector machine

Metrics

37
Cited By
3.86
FWCI (Field Weighted Citation Impact)
35
Refs
0.93
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Spam and Phishing Detection
Physical Sciences →  Computer Science →  Information Systems
Web Data Mining and Analysis
Physical Sciences →  Computer Science →  Information Systems

Related Documents

JOURNAL ARTICLE

Feature Selection Method from Multiclass Text with Class Imbalance Problem

Minji SeoGilseung AhnSun Hur

Journal:   Journal of Korean Institute of Industrial Engineers Year: 2019 Vol: 45 (2)Pages: 93-100
JOURNAL ARTICLE

New Feature Selection Method for Text Categorization

Xingfeng WangHee‐Cheol Kim

Journal:   Journal of information and communication convergence engineering Year: 2017 Vol: 15 (1)Pages: 53-61
© 2026 ScienceGate Book Chapters — All rights reserved.