JOURNAL ARTICLE

Class‐index corpus‐index measure: A novel feature selection method for imbalanced text data

Bekir Parlak

Year: 2022 Journal:   Concurrency and Computation Practice and Experience Vol: 34 (21)   Publisher: Wiley

Abstract

Summary In the field of text classification, some of the datasets are unbalanced datasets. In these datasets, feature selection stage is important to increase performance. There are many studies in this area. However, existing methods have been developed based on the document frequency of only intra‐class. In this study, a new method is proposed considering the situation of the feature in class and corpus. A new feature selection method, namely class‐index corpus‐index measure (CiCi) was presented for unbalanced text classification. The CiCi is a probabilistic method which is calculated using feature distribution in both class and corpus. It has shown a higher performance compared to successful methods in the literature. Multinomial Naïve Bayes and support vector machines were used as classifiers in the experiments. Three different unbalanced datasets are used in the experiments. These benchmark datasets are reuters‐21578, ohsumed, and enron1. Experimental results show that the proposed method has more performance in terms of three different success measures.

Keywords:
Computer science Feature selection Artificial intelligence Class (philosophy) Feature (linguistics) Benchmark (surveying) Pattern recognition (psychology) Measure (data warehouse) Index (typography) Selection (genetic algorithm) Data mining Field (mathematics) Support vector machine Naive Bayes classifier Machine learning Mathematics Geography

Metrics

12
Cited By
2.35
FWCI (Field Weighted Citation Impact)
31
Refs
0.86
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Spam and Phishing Detection
Physical Sciences →  Computer Science →  Information Systems
Imbalanced Data Classification Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

A Novel One Sided Feature Selection Method for Imbalanced Text Classification

Jafar PouraminiBehrouz Minaei‐BidgoliMahdi Esmaeili

Journal:   Signal and Data Processing Year: 2019 Vol: 16 (1)Pages: 21-40
JOURNAL ARTICLE

Feature selection for text categorization on imbalanced data

Zhaohui ZhengWu XiaoyunRohini K. Srihari

Journal:   ACM SIGKDD Explorations Newsletter Year: 2004 Vol: 6 (1)Pages: 80-89
JOURNAL ARTICLE

Feature selection for high dimensional imbalanced class data based on F-measure optimization

Chunkai ZhangGuoquan WangYing ZhouLin YaoZoe L. JiangQing LiaoXuan Wang

Journal:   2017 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC) Year: 2017 Pages: 278-283
JOURNAL ARTICLE

Online feature selection for high-dimensional class-imbalanced data

Peng ZhouXuegang HuPeipei LiXindong Wu

Journal:   Knowledge-Based Systems Year: 2017 Vol: 136 Pages: 187-199
© 2026 ScienceGate Book Chapters — All rights reserved.