JOURNAL ARTICLE

Feature-Grouping-Based Two Steps Feature Selection Algorithm in Software Defect Prediction

Abstract

In order to improve the effect of software defect prediction, many algorithms including feature selection, have been proposed. Based on Wrapper and Filter hybrid framework, a feature-grouping-based feature selection algorithm is proposed in this paper. The algorithm is composed of two steps. In the first step, in order to remove the redundant features, we group the features according to the redundancy between the features. The symmetry uncertainty is used as the constant indicator of the correlation and the FCBF-based grouping algorithm is used to group the features. In the second step, a subset of the features are selected from each group to form the final subset of features. Many classical methods select the representative feature from each group. We consider that when the number of intra-group features is large, the representative features are not enough to reflect the information in this group. Therefore, we require that at least one feature be selected within each group, in this step, the PSO algorithm is used for Searching Randomly from each group. We tested on the open source NASA and PROMISE data sets. Using three kinds of classifier. Compared to the other methods tested in this article, our method resulted in 90% improvement in the predictive performance of 30 sets of results on 10 data sets. Compared with the algorithms without feature selection, the AUC values of this method in the Logistic regression, Naive Bayesian, and K-neighbor classifiers are improved by 5.94% and 4.69% And 8.05%. The FCBF algorithm can also be regarded as a kind of first performing feature grouping. Compared with the FCBF algorithm, the AUC values of this method are improved by 4.78%, 6.41% and 4.4% on the basis of Logistic regression, Naive Bayes and K-neighbor. We can also see that for the FCBF-based grouping algorithm, it could be better to choose a characteristic cloud from each group than to choose a representative one.

Keywords:
Feature selection Computer science Pattern recognition (psychology) Artificial intelligence Naive Bayes classifier Feature (linguistics) Classifier (UML) Redundancy (engineering) Algorithm Data mining Statistical classification Support vector machine

Metrics

6
Cited By
0.89
FWCI (Field Weighted Citation Impact)
20
Refs
0.81
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Software Engineering Research
Physical Sciences →  Computer Science →  Information Systems
Software Reliability and Analysis Research
Physical Sciences →  Computer Science →  Software
Software Testing and Debugging Techniques
Physical Sciences →  Computer Science →  Software

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.