Feature selection plays an important role in cancer classification, for gene expression data usually have a large number of dimensions and relatively a small number of samples. In this paper, we use the support vector machine (SVM) for cancer classification. We propose a mixed two-step feature selection method. The first step uses a modified t-test method to select discriminatory features. The second step extracts principal components from the top-ranked genes based on the modified t-test method. We tested our two-step method in three data sets, i.e., the lymphoma data set, the SRBCT data set, and the ovarian cancer data set. The results in all the three data sets show our two-step methods is able to achieve 100% accuracy with much fewer genes than other published results.
Alabi Waheed BanjokoWaheed Babatunde YahyaMohammed Kabir Garba
Dipali BhosaleRoshani AdeP.R. Deshmukh