Archana Suhas VaidyaDipak V. Patil
This study presents the RSKD ensemble classifier, developed with ensemble feature selection techniques, to address high-dimensional, low-sample-size cancer datasets. Ensemble classifiers are advantageous in such scenarios, offering better classification accuracy than traditional methods by combining multiple models. This combination enhances predictive performance on high-dimensional datasets. However, stability—a key factor for consistent performance on unseen data—often involves a tradeoff with accuracy. Ensemble methods, due to their generalization capabilities, exhibit higher stability, with feature selection stability measured using a consistency index, averaging 65–70%. The RSKD classifier integrates ensemble feature selection methods SU-R and ChS-R, which enhance feature selection stability and classification accuracy. Its performance was evaluated on seven high-dimensional, low-sample-size datasets and compared against state-of-the-art classifiers, including Adaboost, GradientBoost, REPTree, asBagging_FSS, SRKNN, MF-GE, and eAdaBoost with DSC. The RSKD ensemble classifier achieved an accuracy improvement of 7.69% to 12.35% over these methods. Among the feature selection approaches, SU-R combined with RSKD outperformed ChS-R, demonstrating superior results in cancer prediction tasks. The findings of this study underscore the potential of RSKD for achieving generalized, robust performance on challenging datasets. By leveraging ensemble classifiers and ensemble feature selection techniques, researchers can address the inherent difficulties of high-dimensional, low-sample-size datasets, enhancing both accuracy and stability. This work provides a valuable foundation for developing diverse, heterogeneous ensemble approaches for cancer prediction and similar applications.
Xia PeiyongXiangqian DingBai-Ning Jiang