JOURNAL ARTICLE

Online Multi-Label Streaming Feature Selection With Label Correlation

Dianlong YouYang WangJiawei XiaoYaojin LinMaosheng PanZhen ChenLimin ShenXindong Wu

Year: 2021 Journal:   IEEE Transactions on Knowledge and Data Engineering Vol: 35 (3)Pages: 2901-2915   Publisher: IEEE Computer Society

Abstract

Multi-label streaming feature selection has attracted extensive attention in diverse big data applications. However, most existing works focused on the scenarios where labels are independent, while ignoring the real scenarios that they may be interdependent and correlated with each other. This paper aims to fill this gap by developing a novel online multi-label streaming feature selection scheme by taking into account the existence of label correlation, known as (OMSFS LC ). In our design, we first calculate the correlation degree between labels to obtain the label weight. Then, we integrate the mutual information and the label weight to evaluate the correlation between features and labels. In particular, it consists of three stages: 1) online significance analysis, which can determine the significant features via the correlation degree between the newly arriving features and labels; 2) online relevance analysis, which can obtain relevant features via the mutual information; and 3) online redundancy analysis, which can filter the redundant features for removal via pairwise comparison. We implement our solution and conduct extensive experiments on benchmark datasets for performance evaluations. The experimental results exhibit that OMSFS LC significantly outperforms the state-of-the-art methods in terms of effectiveness and efficiency.

Keywords:
Computer science Feature selection Pairwise comparison Mutual information Redundancy (engineering) Correlation Benchmark (surveying) Artificial intelligence Data mining Relevance (law) Feature (linguistics) Selection (genetic algorithm) Machine learning Pattern recognition (psychology) Information retrieval Mathematics

Metrics

45
Cited By
4.23
FWCI (Field Weighted Citation Impact)
61
Refs
0.95
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Spam and Phishing Detection
Physical Sciences →  Computer Science →  Information Systems
Web Data Mining and Analysis
Physical Sciences →  Computer Science →  Information Systems
© 2026 ScienceGate Book Chapters — All rights reserved.