The Feature Selection problem involves discovering a subset of features, such that a classifier built only with this subset would have better predictive accuracy than a classifier built from the entire set of features. A large number of algorithms have already been proposed for the feature selection problem. Although significantly different with regards to (1) the search strategy they use to determine the right subset of features and (2) how each subset is evaluated, feature selection algorithms are usually classified in three general groups: Filters, Wrappers and Hybrid solutions. In this thesis, we propose a new hybrid system for the problem of feature selection in machine learning. The idea behind this new algorithm, FortalFS, is to extract and combine the best characteristics of filters and wrappers in one algorithm. FortalFS uses results from another feature selection system as a starting point in the search through subsets of features that are evaluated by a machine learning algorithm. With an efficient search heuristic, we can decrease the number of subsets of features to be evaluated by the learning algorithm, consequently decreasing computational effort and still be able to select an accurate subset. We have also designed a variant of the original algorithm in the attempt to work with feature weighting algorithm. In order to evaluate this new algorithm, a number of experiments were run and the results compared to well-known feature selection filter and wrapper algorithms, such as Focus, Relief, LVF, and others. Such experiments were run aver a number of datasets from the UCI Repository. Results showed that FortalFS outperforms most of the algorithms significantly. However, it presents time-consuming performance similar to that of wrappers. Additional experiments using specially designed artificial datasets demonstrated that FortalFS is able to identify and remove both irrelevant, redundant and randomly class-correlated features. The FortalFS time-consumption issue is addressed through parallelism. A parallel version of FortalFS based on the master/slave design pattern is implemented and evaluated. In several experiments, we were able to achieve near optimal speedups.
Chunyong YinLuyu MaFeng LüJin WangZhichao YinJeong‐Uk Kim
Teresa B. LudermirRicardo B. C. PrudêncioCleber Zanchettin
Rana P. SinghKuldeep Singh Jadon
Ahmed Ibrahem HafezAboul Ella HassanienHossam M. ZawbaaE. Emary