JOURNAL ARTICLE

Cluster feature selection in high-dimensional linear models

Bingqing LinZhen PangQihua Wang

Year: 2017 Journal:   Random Matrices Theory and Application Vol: 07 (01)Pages: 1750015-1750015   Publisher: World Scientific

Abstract

This paper concerns with variable screening when highly correlated variables exist in high-dimensional linear models. We propose a novel cluster feature selection (CFS) procedure based on the elastic net and linear correlation variable screening to enjoy the benefits of the two methods. When calculating the correlation between the predictor and the response, we consider highly correlated groups of predictors instead of the individual ones. This is in contrast to the usual linear correlation variable screening. Within each correlated group, we apply the elastic net to select variables and estimate their parameters. This avoids the drawback of mistakenly eliminating true relevant variables when they are highly correlated like LASSO [R. Tibshirani, Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Ser. B 58 (1996) 268–288] does. After applying the CFS procedure, the maximum absolute correlation coefficient between clusters becomes smaller and any common model selection methods like sure independence screening (SIS) [J. Fan and J. Lv, Sure independence screening for ultrahigh dimensional feature space, J. R. Stat. Soc. Ser. B 70 (2008) 849–911] or LASSO can be applied to improve the results. Extensive numerical examples including pure simulation examples and semi-real examples are conducted to show the good performances of our procedure.

Keywords:
Lasso (programming language) Feature selection Elastic net regularization Independence (probability theory) Linear regression Mathematics Linear model Distance correlation Correlation Statistics Variable (mathematics) Feature (linguistics) Contrast (vision) Selection (genetic algorithm) Cluster (spacecraft) Computer science Algorithm Artificial intelligence Random variable

Metrics

1
Cited By
0.00
FWCI (Field Weighted Citation Impact)
18
Refs
0.15
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Statistical Methods and Inference
Physical Sciences →  Mathematics →  Statistics and Probability
Bayesian Methods and Mixture Models
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Statistical Methods and Models
Physical Sciences →  Mathematics →  Statistics and Probability

Related Documents

JOURNAL ARTICLE

Feature Screening for High-Dimensional Variable Selection in Generalized Linear Models

Jinzhu JiangJunfeng Shang

Journal:   Entropy Year: 2023 Vol: 25 (6)Pages: 851-851
JOURNAL ARTICLE

Linear-mixed effects models for feature selection in high-dimensional NMR spectra

Yajun MeiSeoung Bum KimKwok‐Leung Tsui

Journal:   Expert Systems with Applications Year: 2008 Vol: 36 (3)Pages: 4703-4708
JOURNAL ARTICLE

Partial profile score feature selection in high-dimensional generalized linear interaction models

Zengchao XuShan LuoZehua Chen

Journal:   Statistics and Its Interface Year: 2022 Vol: 15 (4)Pages: 433-447
JOURNAL ARTICLE

A semi-parametric approach to feature selection in high-dimensional linear regression models

Yuyang LiuPengfei PiShan Luo

Journal:   Computational Statistics Year: 2022 Vol: 38 (2)Pages: 979-1000
© 2026 ScienceGate Book Chapters — All rights reserved.