JOURNAL ARTICLE

Cross‐project defect prediction method based on genetic algorithm feature selection

Zhixi HuYi Zhu

Year: 2023 Journal:   Engineering Reports Vol: 5 (12)   Publisher: Wiley

Abstract

Abstract With the continuous development of Internet technology, the role of software in life is increasing, and software defect prediction (SDP) is a key means to ensure software reliability. SDP is to predict the modules that may have defects in advance based on the historical data of software projects, and its purpose is to maximize the use of testing resources. However, in the actual development process, the project that needs to be predicted is often a new project for which there is little or no historical data. Therefore, how to use the massive data of other related projects to build a cross‐project software defect prediction (CPDP) model has received extensive attention from scholars. However, due to the differences in data distribution and class imbalance between different projects, the performance of CPDP is greatly affected. Therefore, on the basis of CPDP, this article proposes a feature selection method based on genetic algorithm (genetic algorithm feature selection, GAFS). GAFS mainly includes two stages: feature selection and ensemble training. In the feature selection stage, this article proposes a global search adaptive feature selection method based on genetic algorithm, which uses the integrated training results of candidate feature subsets on target data to migrate the optimal feature subset. In the ensemble training phase, the EasyEnsemble method is used to alleviate the class imbalance problem, multiple naive Bayesian classifiers are constructed, and then the final model is constructed through ensemble learning. In this article, F1‐score and MCC are used as the test indicator, and comparative experiments are carried out on AEEEM and Promise. The results show that compared with the five comparison methods, GAFS can improve the average F1‐score and MCC much more. For example, GAFS can improve the average F1‐score value by 38.9%, 31.6%, 35.1%, 22.0%, and 31.6%, respectively. In most cases, it can effectively improve the performance of the model and achieve better prediction results.

Keywords:
Feature selection Computer science Data mining Software Genetic algorithm Feature (linguistics) Machine learning Artificial intelligence Selection (genetic algorithm) Process (computing) Class (philosophy)

Metrics

7
Cited By
4.33
FWCI (Field Weighted Citation Impact)
31
Refs
0.93
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Software Engineering Research
Physical Sciences →  Computer Science →  Information Systems
Software Reliability and Analysis Research
Physical Sciences →  Computer Science →  Software
Software System Performance and Reliability
Physical Sciences →  Computer Science →  Computer Networks and Communications

Related Documents

JOURNAL ARTICLE

An Information Flow-based Feature Selection Method for Cross-Project Defect Prediction

Yaning Wu

Journal:   International Journal of Performability Engineering Year: 2018
JOURNAL ARTICLE

A Cluster Based Feature Selection Method for Cross-Project Software Defect Prediction

Chao NiWangshu LiuXiang ChenQing GuDaoxu ChenQiguo Huang

Journal:   Journal of Computer Science and Technology Year: 2017 Vol: 32 (6)Pages: 1090-1107
JOURNAL ARTICLE

Feature Selection in Cross-Project Software Defect Prediction

Aries SaifudinAgung TrisetyarsoWawan SupartaChuanze KangB S AbbasYaya Heryadi

Journal:   Journal of Physics Conference Series Year: 2020 Vol: 1569 (2)Pages: 022001-022001
JOURNAL ARTICLE

Feature Selection with Stochastic Hill-Climbing Algorithm in Cross Project Defect Prediction

Shailza KanwarLalit Kumar AwasthiVivek Shrivastava

Journal:   2022 2nd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE) Year: 2022 Vol: 9 Pages: 632-635
© 2026 ScienceGate Book Chapters — All rights reserved.