JOURNAL ARTICLE

A Method of CHI-square Feature Selection Based on Probability

ZHANG Huiyi,XIE Yeming,YUAN Zhixiang,SUN Guohua

Year: 2016 Journal:   DOAJ (DOAJ: Directory of Open Access Journals)

Abstract

Traditional CHI-square feature selection method does not take into account the category number of words in imbalanced data sets,the frequency of words,the intra-class and inter-class distribution of words,so that it fails to choose valid feature words for different categories.To solve this problem,a CHI-square feature selection method based on probability is proposed.It is used to measure the frequency of words and documents by probability of words and documents,and calculates the frequency factor of categories,the concentration factors of words between classes,equilibrium degree factors of words in the same classes and the concentration factors of documents between classes.The initial value of CHI-square is adjusted by these factors.The difference degree factor of different classes for the same word is used to make the improved CHI-square select more efficient words.Text classification experiment results show that,compared with the CHI-square feature selection method without improvement,the proposed method improves macroscopic F1 significantly,and has better classification performance on imbalanced datasets.

Keywords:
Feature selection Feature (linguistics) Pattern recognition (psychology) Selection (genetic algorithm) Probability distribution Word (group theory) Degree (music)

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.81
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Imbalanced Data Classification Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Computing and Algorithms
Social Sciences →  Social Sciences →  Urban Studies

Related Documents

BOOK-CHAPTER

Feature Selection Method Based on Chi-Square Test and Minimum Redundancy

Yuxian WangChangyin Zhou

Advances in intelligent systems and computing Year: 2020 Pages: 171-178
JOURNAL ARTICLE

Probability variance CHI feature selection method for unbalanced data

Xiaowen ZhangBingfeng Chen

Journal:   AIP conference proceedings Year: 2017 Vol: 1864 Pages: 020015-020015
JOURNAL ARTICLE

Feature Selection Approach based on Firefly Algorithm and Chi-square

Emad Mohamed MashhourEnas M. F. El HoubyKhaled WassifAkram Ibrahim Salah

Journal:   International Journal of Electrical and Computer Engineering (IJECE) Year: 2018 Vol: 8 (4)Pages: 2338-2338
© 2026 ScienceGate Book Chapters — All rights reserved.