JOURNAL ARTICLE

Matthews correlation coefficient-based feature ranking in recursive ensemble feature selection for high-dimensional and low-sample size data

David Rojas-VelázquezAletta D. KraneveldAlberto TondaAlejandro Lopez‐Rincon

Year: 2025 Journal:   Machine Learning with Applications Vol: 22 Pages: 100757-100757   Publisher: Elsevier BV

Abstract

Identifying reliable biomarkers in omics data is challenging due to the high number of features and limited sample sizes, which often lead to overfitting, biased results, and poor reproducibility. These issues are further complicated by class imbalance, common in medical datasets. To address these challenges, we present MCC-REFS, an improved version of the Recursive Ensemble Feature Selection method. MCC-REFS uses the Matthews Correlation Coefficient (MCC) as a selection criterion, offering a more balanced evaluation of classification performance, especially in imbalanced datasets. Unlike traditional methods that require manual tuning or predefined feature counts, MCC-REFS automatically selects the most informative and compact feature sets using an ensemble of eight machine learning classifiers. We evaluated MCC-REFS on synthetic datasets and several real-world omics datasets, including mRNA expression profiles and multi-label breast cancer data. Compared to existing methods such as REFS, GRACES, DNP, and GCNN, MCC-REFS consistently achieved higher or comparable performance while selecting fewer features. Validation using independent classifiers confirmed the robustness of the selected features. Overall, MCC-REFS provides a scalable, flexible, and reliable approach for feature selection in biomedical research, with strong potential for diagnostic and prognostic applications.

Keywords:

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
55
Refs
0.46
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Face and Expression Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Gene expression and cancer classification
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Molecular Biology
Neural Networks and Applications
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.