JOURNAL ARTICLE

Graph Representation Learning Enhanced Semi-Supervised Feature Selection

Jun TanZhifeng QiuNing Gui

Year: 2024 Journal:   ACM Transactions on Knowledge Discovery from Data Vol: 18 (9)Pages: 1-20   Publisher: Association for Computing Machinery

Abstract

Feature selection is a key step in machine learning by eliminating features that are not related to the modeling target to create reliable and interpretable models. By exploring the potential complex correlations among features of unlabeled data, recently introduced self-supervision-enhanced feature selection greatly reduces the reliance on the labeled samples. However, they are generally based on the autoencoder with sample-wise self-supervision, which can hardly exploit the relations among samples. To address this limitation, this article proposes graph representation learning enhanced semi-supervised feature selection (G-FS) which performs feature selection based on the discovery and exploitation of the non-Euclidean relations among features and samples by translating unlabeled “plain” tabular data into a bipartite graph. A self-supervised edge prediction task is designed to distill rich information on the graph into low-dimensional embeddings, which remove redundant features and noise. Guided by the condensed graph representation, we propose a batch attention feature weight generation mechanism that generates more robust weights according to batch-based selection patterns rather than individual samples. The results show that G-FS achieves significant performance edges in 14 datasets compared to twelve state-of-the-art baselines, including two recent self-supervised baselines. The source code is public available at https://github.com/Icannotnamemyselff/G-FS_Graph_enhacned_feature_selection .

Keywords:
Feature selection Computer science Artificial intelligence Feature learning Graph Pattern recognition (psychology) Autoencoder Exploit Bipartite graph Machine learning Feature (linguistics) Labeled data Data mining Deep learning Theoretical computer science

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
39
Refs
0.12
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Machine Learning and Data Classification
Physical Sciences →  Computer Science →  Artificial Intelligence
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
Gene expression and cancer classification
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Molecular Biology
© 2026 ScienceGate Book Chapters — All rights reserved.