JOURNAL ARTICLE

Software Defect Prediction Based on Feature Subset Selection and Ensemble Classification

Ahmad A. SaifanLina Abu-wardih

Year: 2020 Journal:   ECTI Transactions on Computer and Information Technology (ECTI-CIT) Vol: 14 (2)Pages: 213-228   Publisher: Chiang Mai University

Abstract

Two primary issues have emerged in the machine learning and data mining community: how to deal with imbalanced data and how to choose appropriate features. These are of particular concern in the software engineering domain, and more specifically the field of software defect prediction. This research highlights a procedure which includes a feature selection technique to single out relevant attributes, and an ensemble technique to handle the class-imbalance issue. In order to determine the advantages of feature selection and ensemble methods we look at two potential scenarios: (1) Ensemble models constructed from the original datasets, without feature selection; (2) Ensemble models constructed from the reduced datasets after feature selection has been applied. Four feature selection techniques are employed: Principal Component Analysis (PCA), Pearson’s correlation, Greedy Stepwise Forward selection, and Information Gain (IG). The aim of this research is to assess the effectiveness of feature selection techniques using ensemble techniques. Five datasets, obtained from the PROMISE software depository, are analyzed; tentative results indicate that ensemble methods can improve the model's performance without the use of feature selection techniques. PCA feature selection and bagging based on K-NN perform better than both bagging based on SVM and boosting based on K-NN and SVM, and feature selection techniques including Pearson’s correlation, Greedy stepwise, and IG weaken the ensemble models’ performance.

Keywords:
Feature selection Ensemble learning Computer science Artificial intelligence Boosting (machine learning) Support vector machine Machine learning Data mining Principal component analysis Feature (linguistics) Pattern recognition (psychology) Selection (genetic algorithm) Software Predictive modelling Ensemble forecasting

Metrics

17
Cited By
2.55
FWCI (Field Weighted Citation Impact)
64
Refs
0.91
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Software Engineering Research
Physical Sciences →  Computer Science →  Information Systems
Imbalanced Data Classification Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Software Reliability and Analysis Research
Physical Sciences →  Computer Science →  Software

Related Documents

JOURNAL ARTICLE

A Feature Selection based Ensemble Classification Framework for Software Defect Prediction

Ahmed IqbalShabib AftabIsrar UllahMuhammad Salman BashirMuhammad Anwaar Saeed

Journal:   International Journal of Modern Education and Computer Science Year: 2019 Vol: 11 (9)Pages: 54-64
JOURNAL ARTICLE

A Novel Feature Subset Selection Algorithm for Software Defect Prediction

P ReenaBinu Rajan

Journal:   International Journal of Computer Applications Year: 2014 Vol: 100 (17)Pages: 39-43
JOURNAL ARTICLE

Enhancing Software Defect Prediction: HHO-Based Wrapper Feature Selection with Ensemble Methods

Achmad Fauzan LuthfiRudy HertenoFriska AbadiRadityo Adi NugrohoMuhammad Itqan MazdadiVijay Anant Athavale

Journal:   Indonesian Journal of Electronics Electromedical Engineering and Medical Informatics Year: 2025 Vol: 7 (2)Pages: 188-202
JOURNAL ARTICLE

Ensemble-based feature selection and machine learning models for software defect prediction

Gaurav Kishor KanaujiyaPrabhat Verma

Journal:   Knowledge and Information Systems Year: 2025 Vol: 67 (11)Pages: 10325-10353
© 2026 ScienceGate Book Chapters — All rights reserved.