Processing applications with a large number of dimensions has been a challenge to the data mining community. Feature selection is an effective dimensionality reduction technique. However, there are only a few methods proposed for feature selection for clustering. In this paper, a new feature selection algorithm for unsupervised learning is introduced. It is based on the assumption that, in absence of class labels, the weighted clusterer ensemble result can be employed as a heuristic to guide the feature selection. Therefore, the ReliefF algorithm is then used to assign the rankings for every feature. The main advantage of the proposed method in comparison to conventional schemes in unsupervised feature selection is that it is dimensionality unbiased. Our experiments with several data sets demonstrate that the proposed algorithm is able to detect completely irrelevant features and to remove some additional features without significantly hurting the performance of the clustering algorithm.
Oluwaseun Peter IgeKeng Hoon Gan
Yvan SaeysThomas AbeelYves Van de Peer
Arju Manara BegumM. Rubaiyat Hossain MondalPrajoy PodderJoarder Kamruzzaman