JOURNAL ARTICLE

Variable Selection in High Dimensional Data with Interactions

Zuharah JaafarNorazlina Ismail

Year: 2022 Journal:   International Journal of Advances in Soft Computing and its Applications Vol: 14 (2)Pages: 153-166

Abstract

A common research area in statistical machine learning has been variable selection in high dimensional settings. In recent years, numerous effective approaches have been created to deal with these challenges. In order to improve the prediction accuracy of the model for the given dataset, this study sought to present a double approach variable selection method when pairwise interactions between the explanatory variables exist and to choose the smallest explanatory variable set (considering interactions among them). In this study, a double step method consolidating Random Forest and Adaptive Elastic Net was further examined to mimic potential health effects of environmental contamination. When there were existing interactions in the data or none at all, the double step approach was compared to the single-step adaptive elastic net method and two-step CART paired with the adaptive elastic net method. Using significant statistical tests like RMSE, R2 , and the quantity of the variable chosen for the final model, the success of the strategies was measured. The double step RF+AENET approach produces a simple, constrained model. Despite the complex association between exposure variables, it has the lowest false detection rate for null interactions. A set of variables that have correlation with the result are effectively retained by the screening and variable reduction processes in the RF step of the RF+AENET approach. The double step RF+AENET performs prediction better than a single technique and chooses a sparse model that is close to the true model. Thus, it can be said that when there are pairwise interactions between variables in the simulated biological dataset, the double step technique is a better method for model prediction and parameter estimation. Keywords: Adaptive Elastic Net, Random Forest, Variable Selection, CART.

Keywords:
Pairwise comparison Feature selection Variable (mathematics) Set (abstract data type) Computer science Elastic net regularization Random forest Null hypothesis Data mining Statistics Mathematics Algorithm Artificial intelligence Machine learning

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
23
Refs
0.07
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Neural Networks and Applications
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Variable selection via Lasso with high-dimensional proteomic data

Hongxuan Zhai

Journal:   Open Scholarship Institutional Repository (Washington University in St. Louis) Year: 2018
JOURNAL ARTICLE

PUlasso: High-Dimensional Variable Selection With Presence-Only Data

Hyebin SongGarvesh Raskutti

Journal:   Journal of the American Statistical Association Year: 2018 Vol: 115 (529)Pages: 334-347
© 2026 ScienceGate Book Chapters — All rights reserved.