JOURNAL ARTICLE

Feature subset selection bias for classification learning

Abstract

Feature selection is often applied to high-dimensional data prior to classification learning. Using the same training dataset in both selection and learning can result in so-called feature subset selection bias. This bias putatively can exacerbate data over-fitting and negatively affect classification performance. However, in current practice separate datasets are seldom employed for selection and learning, because dividing the training data into two datasets for feature selection and classifier learning respectively reduces the amount of data that can be used in either task. This work attempts to address this dilemma. We formalize selection bias for classification learning, analyze its statistical properties, and study factors that affect selection bias, as well as how the bias impacts classification learning via various experiments. This research endeavors to provide illustration and explanation why the bias may not cause negative impact in classification as much as expected in regression.

Keywords:
Selection bias Artificial intelligence Computer science Feature selection Machine learning Classifier (UML) Selection (genetic algorithm) Pattern recognition (psychology) Data mining Statistics Mathematics

Metrics

107
Cited By
7.07
FWCI (Field Weighted Citation Impact)
18
Refs
0.97
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Machine Learning and Data Classification
Physical Sciences →  Computer Science →  Artificial Intelligence
Face and Expression Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Data Mining Algorithms and Applications
Physical Sciences →  Computer Science →  Information Systems

Related Documents

JOURNAL ARTICLE

RANDOM SUBSET FEATURE SELECTION FOR CLASSIFICATION

D. Lakshmi PadmajaLakshmi Padmaja Dhyaram

Journal:   International Journal of Advanced Research in Computer Science Year: 2018 Vol: 9 (2)Pages: 317-319
BOOK-CHAPTER

Feature Subset Selection for Fuzzy Classification Methods

Marcos E. CintraHeloisa A. Camargo

Communications in computer and information science Year: 2010 Pages: 318-327
BOOK-CHAPTER

Feature subset selection in text-learning

Dunja Mladenić

Lecture notes in computer science Year: 1998 Pages: 95-100
© 2026 ScienceGate Book Chapters — All rights reserved.