JOURNAL ARTICLE

Variable Selection in High-Dimensional Logistic Regression Models Using a Whitening Approach

Wencan ZhuCéline Lévy-LeducNils Ternès

Year: 2025 Journal:   IEEE Transactions on Computational Biology and Bioinformatics Vol: 22 (2)Pages: 800-807

Abstract

In bioinformatics, the rapid development of sequencing technology has enabled us to collect an increasing amount of omics data. Classification based on omics data is one of the central problems in biomedical research. However, omics data usually has a limited sample size but high feature dimensions, and it is assumed that only a few features (biomarkers) are active, i.e. informative to discriminate between different categories. Identifying active biomarkers for classification has therefore become fundamental for omics data analysis. Focusing on binary classification, we propose an innovative feature selection method aiming at dealing with the high correlations between the biomarkers. Our method, WLogit, consists in whitening the design matrix to remove the correlations between biomarkers, then using a penalized criterion adapted to the logistic regression model to select features. The results from numerical experiments suggest that WLogit can identify almost all active biomarkers even in the cases where the biomarkers are highly correlated, while the other methods fail, which consequently leads to higher classification accuracy. The performance of WLogit is also evaluated on two publicly available datasets, and the obtained classifier outperformed other methods in terms of prediction accuracy. Our method is implemented in the WLogit R package available from the Comprehensive R Archive Network (CRAN).

Keywords:
Logistic regression Feature selection Statistics Selection (genetic algorithm) Logistic model tree Variable (mathematics) Computer science Regression analysis Artificial intelligence Mathematics

Metrics

1
Cited By
7.29
FWCI (Field Weighted Citation Impact)
52
Refs
0.88
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Statistical Methods and Models
Physical Sciences →  Mathematics →  Statistics and Probability
Face and Expression Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Statistical Methods and Inference
Physical Sciences →  Mathematics →  Statistics and Probability

Related Documents

JOURNAL ARTICLE

Variable Selection in Logistic Regression Models

Dietmar ZellnerFrieder KellerGünter E. Zellner

Journal:   Communications in Statistics - Simulation and Computation Year: 2004 Vol: 33 (3)Pages: 787-805
JOURNAL ARTICLE

Sparse Bayesian variable selection in high‐dimensional logistic regression models with correlated priors

Zhuanzhuan MaZifei HanSouparno GhoshLiucang WuMin Wang

Journal:   Statistical Analysis and Data Mining The ASA Data Science Journal Year: 2024 Vol: 17 (1)
JOURNAL ARTICLE

Variable selection for multivariate logistic regression models

Ming‐Hui ChenDipak K. Dey

Journal:   Journal of Statistical Planning and Inference Year: 2002 Vol: 111 (1-2)Pages: 37-55
© 2026 ScienceGate Book Chapters — All rights reserved.