JOURNAL ARTICLE

Data-driven random projection and screening for high-dimensional generalized linear models

Abstract

We address the challenge of correlated predictors in high-dimensional generalized linear model (GLMs), where regression coefficients range from sparse to dense, by proposing a data-driven random projection (RP) method. This is particularly relevant for applications where the number of predictors is (much) larger than the number of observations and the underlying structure—whether sparse or dense—is unknown. We achieve this by using ridge-type estimates for variable screening and RP to incorporate information about the response–predictor relationship when performing dimensionality reduction. We demonstrate that a ridge estimator with a small penalty is effective for RP and screening, but the penalty value must be carefully selected. Unlike in linear regression, where penalties approaching zero work well, this approach leads to overfitting in non-Gaussian families. Instead, we recommend a data-driven method for penalty selection. In a simulation study, this data-driven RP improved prediction performance over conventional RPs, even surpassing benchmarks like elastic net. Furthermore, an ensemble of multiple such RPs combined with probabilistic variable screening delivered the best aggregated results in prediction and variable ranking across varying sparsity levels in our simulation study at a rather low computational cost. Final, three applications with count and binary responses demonstrate the method’s advantages in interpretability and prediction accuracy.

Keywords:

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
26
Refs
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Related Documents

JOURNAL ARTICLE

Sparse data-driven random projection in regression for high-dimensional data

Roman ParzerPeter FilzmoserLaura Vana-Gür

Journal:   Journal of Data Science Statistics and Visualisation Year: 2025 Vol: 5 (5)
JOURNAL ARTICLE

Sequential Feature Screening for Generalized Linear Models with Sparse Ultra-High Dimensional Data

Junying ZhangHang WangRiquan ZhangJiajia Zhang

Journal:   Journal of Systems Science and Complexity Year: 2020 Vol: 33 (2)Pages: 510-526
JOURNAL ARTICLE

Generalized Autoregressive Linear Models for Discrete High-Dimensional Data

Parthe PanditMojtaba Sahraee-ArdakanArash A. AminiSundeep RanganAlyson K. Fletcher

Journal:   IEEE Journal on Selected Areas in Information Theory Year: 2020 Vol: 1 (3)Pages: 884-896
© 2026 ScienceGate Book Chapters — All rights reserved.