Xiaojian DingPengcheng ShiXin WangKaixiang Wang
Microarray data classification is challenged by high dimensionality and small sample sizes, causing feature selection instability. Traditional ensemble feature selection methods struggle to balance diversity and quality effectively. We propose a novel Ensemble Feature Selection Method (EFSM) that introduces a feature mapping diversity metric to generate a robust candidate pool. EFSM first generates a diverse candidate pool of feature selectors by leveraging randomized neural networks to create multiple non-linear feature mappings (views) of the original data. Its core innovation is an ensemble pruning technique formulated as an optimization problem that jointly maximizes both the predictive accuracy of individual selectors and their pairwise diversity. We simplify this NP-hard problem by converting it into a Semi-Definite Programming (SDP) problem and deriving a novel bound for efficient solution. Finally, the rankings from the pruned ensemble are aggregated using the Borda count method. Extensive experiments on 15 biological datasets demonstrate that EFSM outperforms nine state-of-the-art feature selection methods across popular classifiers, achieving superior and stable performance for high-dimensional data analysis.
Supoj HengpraprohmSuwimol Jungjit
Junshan YangJiarui ZhouZexuan ZhuXiaoliang MaZhen Ji
Aiguo WangHuancheng LiuJing YangGuilin Chen
Kunhong LiuBo LiQingqiang WuJun ZhangJi-Xiang DuGuoyan Liu