JOURNAL ARTICLE

Probability variance CHI feature selection method for unbalanced data

Xiaowen ZhangBingfeng Chen

Year: 2017 Journal:   AIP conference proceedings Vol: 1864 Pages: 020015-020015   Publisher: American Institute of Physics

Abstract

The problem of feature selection on unbalanced text data is a difficult problem to be solved. In view of the above problems, this paper analyzes the distribution of the feature items in the class and the class and the difference of the document under the unbalanced data set. The research is based on the word frequency probability and the document probability measurement feature and the document in the unbalanced data this paper proposes a CHI feature selection method based on probabilistic variance, which improves the traditional chi-square statistical model by introducing the intra-class word frequency probability factor, inter-class document probability concentration factor and intra-class uniformity factor. The experiment proves the effectiveness and feasibility of the method.

Keywords:
Feature (linguistics) Feature selection Computer science Variance (accounting) Class (philosophy) Probability distribution Probabilistic logic Artificial intelligence Data set Selection (genetic algorithm) Word (group theory) Set (abstract data type) Data mining Pattern recognition (psychology) Statistics Mathematics

Metrics

1
Cited By
0.00
FWCI (Field Weighted Citation Impact)
7
Refs
0.10
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Web Data Mining and Analysis
Physical Sciences →  Computer Science →  Information Systems
Advanced Text Analysis Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.