A new feature selection method based on distributional information for Text Classification

Nianyun Shi; Lingling Liu

doi:10.1109/pic.2010.5687404

ScienceGate Book Chapters

JOURNAL ARTICLE

A new feature selection method based on distributional information for Text Classification

Nianyun Shi Lingling Liu

Year: 2010 Vol: 44 Pages: 190-194

DOI: 10.1109/pic.2010.5687404

Get Full-Text PDF Get Analytical Report

Abstract

Feature Selection (FS) is one of the most important issues in Text Classification (TC). A good feature selection can improve the efficiency and accuracy of a text classifier. Based on the analysis of the feature's distributional information, this paper presents a feature selection method named DIFS. In DIFS a new estimation mechanism is proposed to measure the relevance between feature's distribution characteristics and contribution to categorization. In addition, two kinds of algorithms are designed to implement DIFS. Experiments are carried out on a Chinese corpus and by comparison the proposed approach shows a better performance.

Keywords:

Feature selection Computer science Text categorization Artificial intelligence Feature (linguistics) Classifier (UML) Categorization Pattern recognition (psychology) Information gain Mutual information Selection (genetic algorithm) Relevance (law) Feature extraction Data mining Machine learning

Metrics

Cited By

0.40

FWCI (Field Weighted Citation Impact)

Refs

0.76

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Text and Document Classification Technologies

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Computational Techniques and Applications

Physical Sciences → Computer Science → Artificial Intelligence

Rough Sets and Fuzzy Logic

Physical Sciences → Computer Science → Computational Theory and Mathematics

A new feature selection method based on distributional information for Text Classification

Abstract

Metrics

Citation History

Topics

Related Documents

A mutual information and information entropy pair based feature selection method in text classification

Modified Pointwise Mutual Information-Based Feature Selection for Text Classification

Feature Selection Method of Text Tendency Classification

Improved information gain feature selection method for Chinese text classification based on word embedding

Information-theoretic feature selection algorithms for text classification