JOURNAL ARTICLE

Improving Cross-Project Software Defect Prediction Method Through Transformation and Feature Selection Approach

Yahaya Zakariyau BalaPathiah Abdul SamatKhaironi Yatim SharifNoridayu Manshor

Year: 2022 Journal:   IEEE Access Vol: 11 Pages: 2318-2326   Publisher: Institute of Electrical and Electronics Engineers

Abstract

In the traditional software defect prediction methodology, the historical record (dataset) of the same project is partitioned into training and testing data. In a practical situation where the project to be predicted is new, traditional software defect prediction cannot be employed. An alternative method is cross-project defect prediction, where the historical record of one project (source) is used to predict the defect status of another project (target). The cross-project defect prediction method solves the limitations of the historical records in the traditional software defect prediction method. However, the performance of cross-project defect prediction is relatively low because of the distribution differences between the source and target projects. Furthermore, the software defect dataset used for cross-project defect prediction is characterized by high-dimensional features, some of which are irrelevant and contribute to low performance. To resolve these two issues, this study proposes a transformation and feature selection approach to reduce the distribution difference and high-dimensional features in cross-project defect prediction. A comparative experiment was conducted on publicly available datasets from the AEEEM. Analysis of the results obtained shows that the proposed approach in conjugation with random forest as the classification model outperformed the other four state-of-the-art cross-project defect prediction methods based on the commonly used performance evaluation metric F1_score.

Keywords:
Computer science Transformation (genetics) Feature selection Selection (genetic algorithm) Software Artificial intelligence Software bug Feature (linguistics) Data mining Pattern recognition (psychology) Programming language

Metrics

25
Cited By
9.50
FWCI (Field Weighted Citation Impact)
43
Refs
0.97
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Software Engineering Research
Physical Sciences →  Computer Science →  Information Systems
Software Reliability and Analysis Research
Physical Sciences →  Computer Science →  Software
Software Testing and Debugging Techniques
Physical Sciences →  Computer Science →  Software
© 2026 ScienceGate Book Chapters — All rights reserved.