Feature selection is a method of data pre-processing widely used when mining large data, such as textual classification. Several studies have been conducted to compare the different methods of feature selection applied to corpora in English. Unfortunately, a small number of works concern the Arabic language. This article aims to present a comparative study of different feature selection techniques including: Chi2, the ANOVA method and mutual information, applied on a corpus in Arabic language, while also diversifying the machine learning algorithms (Naive Bayes, SVM and KNN). This experimental study has shown in general that reducing dimensionality with feature selection techniques has slightly affected the performance of textual classification, reducing the size of the corpus by up to 1%.
Ghazi. I RahoRiyad Al–ShalabiGhassan KanaanAsma'a Nassar
Rawad Awad AlqahtaniHoda Ahmed Abdelhafez
Yannis HaralambousYassir ElidrissiPhilippe Lenca
Djelloul BouchihaAbdelghani BouzianeNoureddine Doumi