JOURNAL ARTICLE

Feature selection using an improved Chi-square for Arabic text classification

Said BahassineAbdellah MadaniMohammed Al-SaremMohamed Kissi

Year: 2018 Journal:   Journal of King Saud University - Computer and Information Sciences Vol: 32 (2)Pages: 225-231   Publisher: Elsevier BV

Abstract

In text mining, feature selection (FS) is a common method for reducing the huge number of the space features and improving the accuracy of classification. In this paper, we propose an improved method for Arabic text classification that employs the Chi-square feature selection (referred to, hereafter, as ImpCHI) to enhance the classification performance. Besides, we have also compared this improved chi-square with three traditional features selection metrics namely mutual information, information gain and Chi-square.Building on our previous work, we extend the current work to assess the method in terms of other evaluation methods using SVM classifier. For this purpose, a dataset of 5070 Arabic documents are classified into six independently classes. In terms of performance, the experimental findings show that combining ImpCHI method and SVM classifier outperforms other combinations in terms of precision, recall and f-measures. This combination significantly improves the performance of Arabic text classification model. The best f-measures obtained for this model is 90.50%, when the number of features is 900. Keywords: Feature selection, Chi-square, Arabic text classification, Light stemming, Mutual information, Information gain, SVM, Decision tree

Keywords:
Feature selection Support vector machine Classifier (UML) Artificial intelligence Computer science Pattern recognition (psychology) Arabic Selection (genetic algorithm) Precision and recall Feature vector Machine learning Data mining

Metrics

244
Cited By
18.27
FWCI (Field Weighted Citation Impact)
42
Refs
0.99
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Spam and Phishing Detection
Physical Sciences →  Computer Science →  Information Systems
Web Data Mining and Analysis
Physical Sciences →  Computer Science →  Information Systems

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.