David Rojas-VelázquezAletta D. KraneveldAlberto TondaAlejandro Lopez‐Rincon
Identifying reliable biomarkers in omics data is challenging due to the high number of features and limited sample sizes, which often lead to overfitting, biased results, and poor reproducibility. These issues are further complicated by class imbalance, common in medical datasets. To address these challenges, we present MCC-REFS, an improved version of the Recursive Ensemble Feature Selection method. MCC-REFS uses the Matthews Correlation Coefficient (MCC) as a selection criterion, offering a more balanced evaluation of classification performance, especially in imbalanced datasets. Unlike traditional methods that require manual tuning or predefined feature counts, MCC-REFS automatically selects the most informative and compact feature sets using an ensemble of eight machine learning classifiers. We evaluated MCC-REFS on synthetic datasets and several real-world omics datasets, including mRNA expression profiles and multi-label breast cancer data. Compared to existing methods such as REFS, GRACES, DNP, and GCNN, MCC-REFS consistently achieved higher or comparable performance while selecting fewer features. Validation using independent classifiers confirmed the robustness of the selected features. Overall, MCC-REFS provides a scalable, flexible, and reliable approach for feature selection in biomedical research, with strong potential for diagnostic and prognostic applications.
Xiaojian DingZihan XuYi LiFumin MaShilin Chen
Can ChenScott T. WeissYang‐Yu Liu
Archana Suhas VaidyaDipak V. Patil
Ashis Kumar MandalMd NadimHasi SahaTangina SultanaMd. Delowar HossainEui‐Nam Huh
S L HappyRamanarayan MohantyAurobinda Routray